Oracle Scratchpad

January 27, 2022

Exadata Catalogue

Filed under: Oracle — Jonathan Lewis @ 6:18 pm GMT Jan 27,2022

This is a list of all articles I’ve written that talk about Exadata, with a date stamp and short note of the contents. The articles are listed most-recent first.

I am currently working backwards through my entire blog adding articles to catalogues as I go, so any individual catalogue may grow very slowly.

Other authors:

  •  
  •  

September 14, 2010

Exadata

Filed under: Exadata,Infrastructure — Jonathan Lewis @ 6:58 pm BST Sep 14,2010

Inevitably there’s still excitement and uncertainty in the air about Exadata – and/or the Database Machine. For ease of reference and a jolly good read I’ve started collecting a few blog postings about it:

Jason Arneil’s Blog

Pythian

Kerry Osborne

Kevin Closson

Tanel Poder

Uwe Hesse

Arup Nanda

Oracle Official Links

 

January 28, 2022

Hinting Catalogue

Filed under: Oracle — Jonathan Lewis @ 2:38 pm GMT Jan 28,2022

This is a list of all articles I’ve written that talk about hints. Each entry has a date stamp and a short note of the contents. The articles are (may be / will be) grouped in a couple of different categories, but otherwise are generally listed most-recent first.

I am currently working backwards through my entire blog adding articles to catalogues as I go, so any individual catalogue may grow very slowly.

  • opt_estimate() catalogue: a separate catalogue of articles about the (undocumented) opt_estimate() hint and loosely related hints.
  • ignoring hints catalogue: a separate catalogue of articles about how Oracle doesn’t ignore hints, or examples of bugs, or explanations of the difficulty of hinting correctly.
  • Case Study (June 2022): an example of the SQL Monitor report showing that a no_merge hint was the appropriate solution
  • System_stats() hint (Jan 2022): for Exadata, the optimizer can be told to override the stored system stats.
  • Hash Aggregation 1 (Jan 2022): you may need to hint hash aggregation for some queries that group by then order by the same columns
  • use_nl() redux (Oct 2021): where in its processing does Oracle apply join method hints like use_nl()
  • Ordered hint (Sept 2021): you shouldn’t be using it, and this note shows a surprising “place group by” side effect
  • Distributed Query (Aug 2021): A real tangle when using the rule, driving_site and materialize hints (badly) together
  • Driving_site() in a patch (May 2021): specific example of a general problem – changing an inline hint to an SQL Patch may require a target query block name and a fully qualified table alias.
  • use_nl_with_index (Mar 2021): a description of what it can do, and one (current) omission
  • Index descending First N (Dec 2020): until 19c an first rows in descending order had to be hinted with index_desc()
  • Using the num_index_keys() hint (Nov 2018): an undocumented hint that adjusts the access/filter strategy for index range scans in special cases.
  • Five Hints (Dec 2015): the paper I wrote for a DOAG presentation on the (then) top 5 most most important hints.
  • New Index hints (June 2013): Analysis of how Oracle handles “near misses” on hinting by index description.
  • New Index hints (Nov 2011): Oracle 10g introduced an index hinting by description. Some examples and comments.
  • v$sql_shared_memory (Jan 2009): Even Oracle’s developers get the use_nl() hint wrong.

January 27, 2022

Statistics catalogue

Filed under: Oracle — Jonathan Lewis @ 6:16 pm GMT Jan 27,2022

This is a list of all articles I’ve written that talk about statistics, with a date stamp and short note of the contents. I have also created a couple of sub-catalogues around major topics:

I am currently working backwards through my entire blog adding articles to catalogues as I go, so any individual catalogue may grow very slowly.

  • 19c – better stats (April 2022): Oracle now uses approx_count_distinct() on the whole index in dbms_stats.gather_index_stats().
  • system_stats() hint (Jan 2022): notes on a hint that allows you to set the “noworkload” system statistics in a hint on Exadata.
  • Optimizer Tip (Sept 2021): reprint of a note for IOUG 2015 on column groups and the table_cached_blocks preference
  • Stats preferences (Aug 2021): a simple query to report all the system and table preferences, default or explicitly set
  • Using the num_index_keys() hint (Nov 2018): controlling the split between access and filter predicates used in the index for complex in-lists.
  • Reporting times of stats operations (Sept 2018): dbms_stats functions to report details of stats gathering operations.
  • table_cached_blocks defect (July 2018): A table preference about indexes, but it isn’t used during create or rebuild of index.
    • table_cached_blocks and RAC (Nov 2015): if you set the table preference to adjust the index clustering_factor, should you allow for the effects of RAC?
    • table_cached_blocks (May 2013): At last, a “legal” way to adjust the clustering_factor to a realistic value.
  • Optimising statspack (Oct 2016): selective stats collection for data dictionary / internal queries.
  • v$lock efficiency (May 2013): the benefit of collecting stats on individual x$ structures
  • Minimum Stats (Sept 2012): auto_sample_size on an index tends to mean 1,140 blocks

Other Authors

January 26, 2022

system_stats() hint

Filed under: 18c,19c,Hints,Oracle,Statistics,System Stats — Jonathan Lewis @ 9:46 am GMT Jan 26,2022

Starting from 18.3 Oracle introduced the system_stats() hint, apparently allowing you to set the system statistics for the duration of a query. However the hint didn’t seem to have any effect in that version of Oracle – even though the fix_control that seemed to be the most relevant (QKSFM_DBMS_STATS_24952618) was set to 1, so maybe the hint was acting strictly according to the fix control description, which was: “turn on Exadata stats: MBRC,IOTFRSPEED,IOSEEKTIME” (or maybe the fix control had nothing to do with the hint)

According to my notes I had a test that showed it working on live SQL, which (in my notes) I said was running 19.2 at the time; however, I can’t get it to work on 19.11.0.0 or 21.3.0.0 on a Linux VM (or on the current Live SQL version) despite a load of fiddling with potentially relevant hidden parameters, fix controls, and numeric event numbers. So maybe it is only for Exadata.

It’s not documented, of course, but I’m fairly confident I’m using the correct syntax – which was quite easy to find (sometimes you get lucky) because a search through the binary for the hint text produced a perfect result:


[oracle@linux183 bin]$ strings -a oracle | grep -T -n  -i system_stats\(
1762556:BEGIN :1 := dbms_stats_internal.store_system_stats(:2, :3, :4); END;
1787190:system_stats(mbrc=%f ioseektim=%f iotfrspeed=%f)

So it would seem (from line 1787190) that we can override three of the system statistics: mbrc, ioseektim, and iotfrspeed. Thanks to the hint_report option that 19c introduced to dispay_xxxxxx() calls in dbms_xplan it’s easy to see that this syntax is correct but unused. From a call to dbms_xplan.display_cursor() in 19.11.0.0:

select  /*+ system_stats(mbrc=128 ioseektim=1 iotfrspeed=262144) */ count(*) from t1

Plan hash value: 3724264953

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |  2732 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 50000 |  2732   (1)| 00:00:01 |
-------------------------------------------------------------------

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
   0 -  STATEMENT
         U -  system_stats(mbrc=128 ioseektim=1 iotfrspeed=262144)

Other tests reported shorter versions of the hint (e.g. /*+ system_stats(mbrc=128) */ ) as errors:


Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (E - Syntax error (1))
---------------------------------------------------------------------------
   1 -  SEL$1
         E -  system_stats

In passing, it’s interesting to note that the text was reported as a “query block” hint (sel$1) when it had a syntax error despite being a “statement-level” hint when it was recognised. Presumably the generic parsing rule is: “it’s a query block hint unless proved otherwise”.

The call to dbms_stat_internal.store_system_stats() that also dropped out of the scan of the executable looks as if it’s the function that sets the “noworkload” statistics – the three parameters are, in order: ioseektim, iotfrspeed, cpuspeednw – but possibly it’s the internal call used when you use the ‘EXADATA’ option for gathering system stats.

Bottom line:

Maybe there’s a way to switch this hint on to override the default system stats; maybe it just needs to be run on Exadata; and maybe – if it can be switched on – it could be attached as an SQL_Patch.  Experimentation left to readers who have access to an Exadata system, any results are welcome.

January 25, 2022

Catalog Catalogue

Filed under: Oracle — Jonathan Lewis @ 12:22 pm GMT Jan 25,2022

This is a good starting point for researching a topic on my blog. It’s simply a list of the various catalog(ue)s of articles that I’ve written over the years. There’s still a lot of work to do, though, as there are more than 1,500 published articles on the blog and though I have published a few of the catalogues already they are almost guaranteed to be incomplete.

The target state is for each catalogue to list all relevant articles (which means some articles will appear in more than one catalogue), generally in reverse order of publication, with the date of each entry.

Each entry lists the number of articles currently in the catalogue, breaking them down by author (viz: me / others). In the near future, when most of the catalogues have a reasonable number of recent entries, this catalogue will datestamp each listed catalogue with its last update date and highlight a short selection of the most recently updated.

 

October 11, 2021

Adaptive Study

Filed under: Execution plans,Oracle — Jonathan Lewis @ 11:57 am BST Oct 11,2021

This is a little case study of adaptive optimisation in Oracle 19c with a surprising side-effect showing up when the optimizer gave the execution engine the option to “do the right thing” and the execution engine took it – except the “right thing” turned out to be a wrong thing.

We started with a request to the Oracle-L list server asking about the difference between the operations “table access by rowid” and “table access by rowid batched” and why changing the parameter “optimizer_adaptive_reporting_only” should make a plan switch from one to the other, and how much of a performance impact this would have because this was the only change that showed up in a plan that went from fast (enough) to very slow when the parameter was changed from true to false.

The batching (or not) of the table access really shouldn’t make much difference; the batch option tends to appear if there’s a “blocking” operation (such as a hash join) further up the execution plan, but the mechanism by which a rowsource is produced and passed up the tree is only likely to be affected very slightly. So there had to be something else going on.

Fortunately the OP had the SQL Monitor reports available from a fast / non-batched / reporting only = true run and a slow / batched / “reporting only = false” run. I’ve shown these below with the option to expand and contract them on demand:

Fast plan (reporting only):

Click on this line to expand the “reporting only = true (fast)” plan
Global Information
------------------------------
 Status              :  DONE (ALL ROWS)         
 Instance ID         :  2                       
 Session             :  XXXXX (510:5394) 
 SQL ID              :  791qwn38bq6gv           
 SQL Execution ID    :  33554432                
 Execution Started   :  10/07/2021 11:46:56     
 First Refresh Time  :  10/07/2021 11:46:56     
 Last Refresh Time   :  10/07/2021 11:51:36     
 Duration            :  280s                    
 Module/Action       :  SQL*Plus/-              
 Service             :  XXXXX.XXXXX.com 
 Program             :  sqlplus.exe             
 Fetch Calls         :  370                     

Global Stats
===========================================================================
| Elapsed |   Cpu   |    IO    | Cluster  | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
===========================================================================
|     252 |     170 |       71 |       11 |   370 |    39M | 251K |   2GB |
===========================================================================

SQL Plan Monitoring Details (Plan Hash Value=250668601)
===============================================================================================================================================================================
| Id |                      Operation                       |             Name              |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read  | Read  |  Mem  |
|    |                                                      |                               | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs  | Bytes | (Max) |
===============================================================================================================================================================================
|  0 | SELECT STATEMENT                                     |                               |         |       |       279 |     +2 |     1 |       2M |       |       |     . |
|  1 |   FILTER                                             |                               |         |       |       279 |     +2 |     1 |       2M |       |       |     . |
|  2 |    NESTED LOOPS OUTER                                |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  3 |     NESTED LOOPS OUTER                               |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  4 |      HASH JOIN OUTER                                 |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  5 |       NESTED LOOPS OUTER                             |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  6 |        STATISTICS COLLECTOR                          |                               |         |       |       279 |     +2 |     1 |       2M |       |       |     . |
|  7 |         NESTED LOOPS OUTER                           |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  8 |          HASH JOIN OUTER                             |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  9 |           NESTED LOOPS OUTER                         |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
| 10 |            STATISTICS COLLECTOR                      |                               |         |       |       279 |     +2 |     1 |       2M |       |       |     . |
| 11 |             NESTED LOOPS OUTER                       |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
| 12 |              NESTED LOOPS OUTER                      |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
| 13 |               NESTED LOOPS                           |                               |    272K |    2M |       279 |     +2 |     1 |       2M |       |       |     . |
| 14 |                NESTED LOOPS OUTER                    |                               |    272K |    2M |       279 |     +2 |     1 |       2M |       |       |     . |
| 15 |                 NESTED LOOPS                         |                               |    272K |    2M |       279 |     +2 |     1 |       2M |       |       |     . |
| 16 |                  NESTED LOOPS OUTER                  |                               |    272K |    1M |       279 |     +2 |     1 |       2M |       |       |     . |
| 17 |                   NESTED LOOPS                       |                               |    272K |    1M |       279 |     +2 |     1 |       2M |       |       |     . |
| 18 |                    FILTER                            |                               |         |       |       279 |     +2 |     1 |       2M |       |       |     . |
| 19 |                     NESTED LOOPS OUTER               |                               |    272K |  598K |       279 |     +2 |     1 |       2M |       |       |     . |
| 20 |                      VIEW                            | index$_join$_006              |    276K | 48299 |       279 |     +2 |     1 |       2M |       |       |     . |
| 21 |                       HASH JOIN                      |                               |         |       |       279 |     +2 |     1 |       2M |       |       | 132MB |
| 22 |                        HASH JOIN                     |                               |         |       |         2 |     +1 |     1 |       2M |       |       | 124MB |
| 23 |                         INDEX STORAGE FAST FULL SCAN | TET_IX2                       |    276K |  8505 |         1 |     +2 |     1 |       2M |       |       |     . |
| 24 |                         INDEX STORAGE FAST FULL SCAN | TET_IX4                       |    276K | 13077 |         1 |     +2 |     1 |       2M |       |       |     . |
| 25 |                        INDEX STORAGE FAST FULL SCAN  | TET_PK                        |    276K | 11889 |       279 |     +2 |     1 |       2M |   149 |  62MB |     . |
| 26 |                      TABLE ACCESS BY INDEX ROWID     | TT                            |       1 |     2 |       279 |     +2 |    2M |       2M |  2347 |  18MB |     . |
| 27 |                       INDEX UNIQUE SCAN              | TT_PK                         |       1 |     1 |       279 |     +2 |    2M |       2M |    11 | 90112 |     . |
| 28 |                    TABLE ACCESS BY INDEX ROWID       | TM                            |       1 |     2 |       279 |     +2 |    2M |       2M | 12476 |  97MB |     . |
| 29 |                     INDEX UNIQUE SCAN                | TM_PK                         |       1 |     1 |       279 |     +2 |    2M |       2M |  1683 |  13MB |     . |
| 30 |                   TABLE ACCESS BY INDEX ROWID        | TU                            |       1 |     1 |       257 |    +21 |    2M |    17764 |   137 |   1MB |     . |
| 31 |                    INDEX UNIQUE SCAN                 | TU_PK                         |       1 |       |       257 |    +21 |    2M |    17764 |     1 |  8192 |     . |
| 32 |                  TABLE ACCESS BY INDEX ROWID         | TEP                           |       1 |     2 |       279 |     +2 |    2M |       2M |  155K |   1GB |     . |
| 33 |                   INDEX UNIQUE SCAN                  | TEP_PK                        |       1 |     1 |       279 |     +2 |    2M |       2M |  1729 |  14MB |     . |
| 34 |                 TABLE ACCESS BY INDEX ROWID          | TLIM                          |       1 |     1 |       279 |     +2 |    2M |       2M |       |       |     . |
| 35 |                  INDEX UNIQUE SCAN                   | TLIM_PK                       |       1 |       |       279 |     +2 |    2M |       2M |       |       |     . |
| 36 |                TABLE ACCESS BY INDEX ROWID           | TLPSE                         |       1 |     1 |       279 |     +2 |    2M |       2M |       |       |     . |
| 37 |                 INDEX UNIQUE SCAN                    | TLPSE_PK                      |       1 |       |       279 |     +2 |    2M |       2M |       |       |     . |
| 38 |               INDEX RANGE SCAN                       | TCX_IX2                       |       1 |     2 |       279 |     +2 |    2M |       2M |  8870 |  69MB |     . |
| 39 |              TABLE ACCESS BY INDEX ROWID             | TC                            |       1 |     2 |       279 |     +2 |    2M |       2M | 14648 | 114MB |     . |
| 40 |               INDEX UNIQUE SCAN                      | TC_PK                         |       1 |     1 |       279 |     +2 |    2M |       2M |   157 |   1MB |     . |
| 41 |            INDEX RANGE SCAN                          | TCX_PK                        |       1 |     2 |       279 |     +2 |    2M |       2M |       |       |     . |
| 42 |           INDEX RANGE SCAN                           | TCX_PK                        |       1 |     2 |           |        |       |          |       |       |     . |
| 43 |          TABLE ACCESS BY INDEX ROWID                 | TC                            |       1 |     2 |       279 |     +2 |    2M |       2M | 16037 | 125MB |     . |
| 44 |           INDEX UNIQUE SCAN                          | TC_PK                         |       1 |     1 |       279 |     +2 |    2M |       2M |   224 |   2MB |     . |
| 45 |        TABLE ACCESS BY INDEX ROWID                   | TP                            |       1 |     3 |       279 |     +2 |    2M |       2M |       |       |     . |
| 46 |         INDEX RANGE SCAN                             | TP_PK                         |      15 |     1 |       279 |     +2 |    2M |      28M |       |       |     . |
| 47 |       TABLE ACCESS BY INDEX ROWID                    | TP                            |       1 |     3 |           |        |       |          |       |       |     . |
| 48 |        INDEX RANGE SCAN                              | TP_PK                         |      15 |     1 |           |        |       |          |       |       |     . |
| 49 |      TABLE ACCESS STORAGE FULL FIRST ROWS            | TLIET                         |       1 |     3 |       279 |     +2 |    2M |       2M |       |       |     . |
| 50 |     VIEW PUSHED PREDICATE                            | TEB_VW                        |       1 |    57 |       256 |    +24 |    2M |     1459 |       |       |     . |
| 51 |      NESTED LOOPS OUTER                              |                               |       1 |    57 |       272 |     +8 |    2M |     1459 |       |       |     . |
| 52 |       NESTED LOOPS                                   |                               |       1 |    55 |       256 |    +24 |    2M |     1459 |       |       |     . |
| 53 |        NESTED LOOPS                                  |                               |       1 |    53 |       256 |    +24 |    2M |     1459 |       |       |     . |
| 54 |         NESTED LOOPS                                 |                               |       1 |    51 |       272 |     +9 |    2M |     1459 |       |       |     . |
| 55 |          NESTED LOOPS                                |                               |       5 |    41 |       279 |     +2 |    2M |     6965 |       |       |     . |
| 56 |           NESTED LOOPS                               |                               |       1 |     7 |       279 |     +2 |    2M |     770K |       |       |     . |
| 57 |            NESTED LOOPS                              |                               |       1 |     4 |       279 |     +2 |    2M |     770K |       |       |     . |
| 58 |             NESTED LOOPS                             |                               |       1 |     3 |       279 |     +2 |    2M |     770K |       |       |     . |
| 59 |              TABLE ACCESS BY INDEX ROWID             | TEP                           |       1 |     3 |       279 |     +2 |    2M |     770K |       |       |     . |
| 60 |               INDEX UNIQUE SCAN                      | TEP_PK                        |       1 |     2 |       279 |     +2 |    2M |       2M |       |       |     . |
| 61 |              INDEX RANGE SCAN                        | TLP_IX1                       |       1 |       |       279 |     +2 |  770K |     770K |       |       |     . |
| 62 |             VIEW                                     |                               |       1 |     1 |       279 |     +2 |  770K |     770K |       |       |     . |
| 63 |              SORT AGGREGATE                          |                               |       1 |       |       279 |     +2 |  770K |     770K |       |       |     . |
| 64 |               TABLE ACCESS BY INDEX ROWID            | TPR                           |       1 |     1 |       279 |     +2 |  770K |     770K |       |       |     . |
| 65 |                INDEX UNIQUE SCAN                     | TPR_PK                        |       1 |       |       279 |     +2 |  770K |     770K |       |       |     . |
| 66 |            TABLE ACCESS BY INDEX ROWID               | TET                           |       1 |     3 |       279 |     +2 |  770K |     770K | 28892 | 226MB |     . |
| 67 |             INDEX RANGE SCAN                         | TET_Ix1                       |       1 |     2 |       279 |     +2 |  770K |     899K |  6957 |  54MB |     . |
| 68 |           TABLE ACCESS BY INDEX ROWID                | TWE                           |       5 |    34 |       272 |     +9 |  770K |     6965 |   890 |   7MB |     . |
| 69 |            INDEX RANGE SCAN                          | TWE_IDX1                      |      35 |     2 |       272 |     +9 |  770K |     6965 |    22 | 176KB |     . |
| 70 |          TABLE ACCESS BY INDEX ROWID                 | TT                            |       1 |     2 |       272 |     +9 |  6965 |     1459 |       |       |     . |
| 71 |           INDEX UNIQUE SCAN                          | TT_PK                         |       1 |     1 |       272 |     +9 |  6965 |     6965 |       |       |     . |
| 72 |         INDEX RANGE SCAN                             | TCX_IX2                       |       1 |     2 |       256 |    +24 |  1459 |     1459 |   932 |   7MB |     . |
| 73 |        TABLE ACCESS BY INDEX ROWID                   | TC                            |       1 |     2 |       256 |    +24 |  1459 |     1459 |       |       |     . |
| 74 |         INDEX UNIQUE SCAN                            | TC_PK                         |       1 |     1 |       256 |    +24 |  1459 |     1459 |       |       |     . |
| 75 |       TABLE ACCESS BY INDEX ROWID                    | TLS                           |       1 |     2 |       256 |    +24 |  1459 |     1451 |       |       |     . |
| 76 |        INDEX SKIP SCAN                               | TLS_PK                        |       1 |     1 |       256 |    +24 |  1459 |     1451 |       |       |     . |
| 77 |    SORT AGGREGATE                                    |                               |       1 |       |       279 |     +2 |    2M |       2M |       |       |     . |
| 78 |     FIRST ROW                                        |                               |       1 |     3 |       279 |     +2 |    2M |       2M |       |       |     . |
| 79 |      INDEX RANGE SCAN (MIN/MAX)                      | TCX_IX2                       |       1 |     3 |       279 |     +2 |    2M |       2M |       |       |     . |
===============================================================================================================================================================================

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3 (U - Unused (1))
---------------------------------------------------------------------------
    0 -  STATEMENT
         U -  first_rows / hint overridden by another in parent query block
           -  first_rows
 
  56 -  SEL$5
           -  no_merge
 
Note
-----
   - this is an adaptive plan

Slow plan (runtime adapted):

Click on this line to expand the “reporting_only = false (slow)” plan
Global Information
------------------------------
 Status              :  DONE (ALL ROWS)          
 Instance ID         :  2                        
 Session             :  XXXXX (509:27860) 
 SQL ID              :  8t19y7v5j9ztg            
 SQL Execution ID    :  33554432                 
 Execution Started   :  10/07/2021 07:56:09      
 First Refresh Time  :  10/07/2021 07:56:09      
 Last Refresh Time   :  10/07/2021 08:07:17      
 Duration            :  668s                     
 Module/Action       :  SQL*Plus/-               
 Service             :  XXXXX.XXXXX.com  
 Program             :  sqlplus.exe              
 Fetch Calls         :  370                      

Global Stats
==========================================================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Cluster  | Fetch | Buffer | Read | Read  | Write | Write |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Calls |  Gets  | Reqs | Bytes | Reqs  | Bytes | Returned Bytes |
==========================================================================================================================
|     705 |     280 |      270 |        0.00 |      155 |   370 |    40M | 984K |  11GB |  6422 |   3GB |            6GB |
==========================================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=3015036808)
========================================================================================================================================================================================================
| Id |                      Operation                       |             Name              |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read  | Read  | Write | Write |  Mem  | Temp  |
|    |                                                      |                               | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs  | Bytes | Reqs  | Bytes | (Max) | (Max) |
========================================================================================================================================================================================================
|  0 | SELECT STATEMENT                                     |                               |         |       |       512 |   +157 |     1 |       2M |       |       |       |       |     . |     . |
|  1 |   FILTER                                             |                               |         |       |       512 |   +157 |     1 |       2M |       |       |       |       |     . |     . |
|  2 |    NESTED LOOPS OUTER                                |                               |       1 |    3M |       512 |   +157 |     1 |       2M |       |       |       |       |     . |     . |
|  3 |     NESTED LOOPS OUTER                               |                               |       1 |    3M |       512 |   +157 |     1 |       2M |       |       |       |       |     . |     . |
|  4 |      HASH JOIN OUTER                                 |                               |       1 |    3M |       538 |   +131 |     1 |       2M |  3387 |   2GB |  3387 |   2GB | 450MB |   2GB |
|  5 |       NESTED LOOPS OUTER                             |                               |       1 |    3M |        27 |   +131 |     1 |       2M |       |       |       |       |     . |     . |
|  6 |        STATISTICS COLLECTOR                          |                               |         |       |        27 |   +131 |     1 |       2M |       |       |       |       |     . |     . |
|  7 |         NESTED LOOPS OUTER                           |                               |       1 |    3M |        27 |   +131 |     1 |       2M |       |       |       |       |     . |     . |
|  8 |          HASH JOIN OUTER                             |                               |       1 |    3M |       155 |     +3 |     1 |       2M |  3035 |   1GB |  3035 |   1GB | 309MB |   1GB |
|  9 |           NESTED LOOPS OUTER                         |                               |       1 |    3M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 10 |            STATISTICS COLLECTOR                      |                               |         |       |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 11 |             NESTED LOOPS OUTER                       |                               |       1 |    3M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 12 |              NESTED LOOPS OUTER                      |                               |       1 |    3M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 13 |               NESTED LOOPS                           |                               |    272K |    2M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 14 |                NESTED LOOPS OUTER                    |                               |    272K |    2M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 15 |                 NESTED LOOPS                         |                               |    272K |    2M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 16 |                  NESTED LOOPS OUTER                  |                               |    272K |    1M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 17 |                   NESTED LOOPS                       |                               |    272K |    1M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 18 |                    FILTER                            |                               |         |       |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 19 |                     NESTED LOOPS OUTER               |                               |    272K |  598K |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 20 |                      VIEW                            | index$_join$_006              |    276K | 48299 |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 21 |                       HASH JOIN                      |                               |         |       |       129 |     +3 |     1 |       2M |       |       |       |       | 132MB |     . |
| 22 |                        HASH JOIN                     |                               |         |       |         3 |     +1 |     1 |       2M |       |       |       |       | 124MB |     . |
| 23 |                         INDEX STORAGE FAST FULL SCAN | TET_IX2                       |    276K |  8505 |         1 |     +1 |     1 |       2M |   129 |  54MB |       |       |     . |     . |
| 24 |                         INDEX STORAGE FAST FULL SCAN | TET_IX4                       |    276K | 13077 |         3 |     +1 |     1 |       2M |   167 |  81MB |       |       |     . |     . |
| 25 |                        INDEX STORAGE FAST FULL SCAN  | TET_PK                        |    276K | 11889 |       129 |     +3 |     1 |       2M |   198 |  61MB |       |       |     . |     . |
| 26 |                      TABLE ACCESS BY INDEX ROWID     | TT                            |       1 |     2 |       129 |     +3 |    2M |       2M |  1488 |  12MB |       |       |     . |     . |
| 27 |                       INDEX UNIQUE SCAN              | TT_PK                         |       1 |     1 |       129 |     +3 |    2M |       2M |     7 | 57344 |       |       |     . |     . |
| 28 |                    TABLE ACCESS BY INDEX ROWID       | TM                            |       1 |     2 |       129 |     +3 |    2M |       2M |  9875 |  77MB |       |       |     . |     . |
| 29 |                     INDEX UNIQUE SCAN                | TM_PK                         |       1 |     1 |       129 |     +3 |    2M |       2M |  1235 |  10MB |       |       |     . |     . |
| 30 |                   TABLE ACCESS BY INDEX ROWID        | TU                            |       1 |     1 |       119 |    +11 |    2M |    17764 |       |       |       |       |     . |     . |
| 31 |                    INDEX UNIQUE SCAN                 | TU_PK                         |       1 |       |       119 |    +11 |    2M |    17764 |       |       |       |       |     . |     . |
| 32 |                  TABLE ACCESS BY INDEX ROWID         | TEP                           |       1 |     2 |       129 |     +3 |    2M |       2M |  140K |   1GB |       |       |     . |     . |
| 33 |                   INDEX UNIQUE SCAN                  | TEP_PK                        |       1 |     1 |       129 |     +3 |    2M |       2M |  1478 |  12MB |       |       |     . |     . |
| 34 |                 TABLE ACCESS BY INDEX ROWID          | TLIM                          |       1 |     1 |       129 |     +3 |    2M |       2M |       |       |       |       |     . |     . |
| 35 |                  INDEX UNIQUE SCAN                   | TLIM_PK                       |       1 |       |       129 |     +3 |    2M |       2M |       |       |       |       |     . |     . |
| 36 |                TABLE ACCESS BY INDEX ROWID           | TLPSE                         |       1 |     1 |       129 |     +3 |    2M |       2M |       |       |       |       |     . |     . |
| 37 |                 INDEX UNIQUE SCAN                    | TLPSE_PK                      |       1 |       |       129 |     +3 |    2M |       2M |       |       |       |       |     . |     . |
| 38 |               INDEX RANGE SCAN                       | TCX_IX2                       |       1 |     2 |       129 |     +3 |    2M |       2M |  4642 |  36MB |       |       |     . |     . |
| 39 |              TABLE ACCESS BY INDEX ROWID             | TC                            |       1 |     2 |       129 |     +3 |    2M |       2M | 22307 | 174MB |       |       |     . |     . |
| 40 |               INDEX UNIQUE SCAN                      | TC_PK                         |       1 |     1 |       129 |     +3 |    2M |       2M |   546 |   4MB |       |       |     . |     . |
| 41 |            INDEX RANGE SCAN                          | TCX_PK                        |       1 |     2 |           |        |       |          |       |       |       |       |     . |     . |
| 42 |           INDEX RANGE SCAN                           | TCX_PK                        |       1 |     2 |         1 |   +131 |     1 |     976K |       |       |       |       |     . |     . |
| 43 |          TABLE ACCESS BY INDEX ROWID                 | TC                            |       1 |     2 |        27 |   +131 |    2M |       2M | 21549 | 168MB |       |       |     . |     . |
| 44 |           INDEX UNIQUE SCAN                          | TC_PK                         |       1 |     1 |        27 |   +131 |    2M |       2M |   959 |   7MB |       |       |     . |     . |
| 45 |        TABLE ACCESS BY INDEX ROWID BATCHED           | TP                            |       1 |     3 |           |        |       |          |       |       |       |       |     . |     . |
| 46 |         INDEX RANGE SCAN                             | TP_PK                         |      15 |     1 |           |        |       |          |       |       |       |       |     . |     . |
| 47 |       TABLE ACCESS BY INDEX ROWID BATCHED            | TP                            |       1 |     3 |        36 |   +157 |     1 |       15 |       |       |       |       |     . |     . |
| 48 |        INDEX RANGE SCAN                              | TP_PK                         |      15 |     1 |        36 |   +157 |     1 |       15 |       |       |       |       |     . |     . |
| 49 |      TABLE ACCESS STORAGE FULL FIRST ROWS            | TLIET                         |       1 |     3 |       512 |   +157 |    2M |       2M |       |       |       |       |     . |     . |
| 50 |     VIEW PUSHED PREDICATE                            | TEB_VW                        |       1 |    57 |       506 |   +163 |    2M |     1459 |       |       |       |       |     . |     . |
| 51 |      NESTED LOOPS OUTER                              |                               |       1 |    57 |       506 |   +163 |    2M |     1459 |       |       |       |       |     . |     . |
| 52 |       NESTED LOOPS                                   |                               |       1 |    55 |       506 |   +163 |    2M |     1459 |       |       |       |       |     . |     . |
| 53 |        NESTED LOOPS                                  |                               |       1 |    53 |       506 |   +163 |    2M |     1459 |       |       |       |       |     . |     . |
| 54 |         NESTED LOOPS                                 |                               |       1 |    51 |       506 |   +163 |    2M |     1459 |       |       |       |       |     . |     . |
| 55 |          NESTED LOOPS                                |                               |       5 |    41 |       510 |   +159 |    2M |     6965 |       |       |       |       |     . |     . |
| 56 |           NESTED LOOPS                               |                               |       1 |     7 |       510 |   +159 |    2M |     770K |       |       |       |       |     . |     . |
| 57 |            NESTED LOOPS                              |                               |       1 |     4 |       510 |   +159 |    2M |     770K |       |       |       |       |     . |     . |
| 58 |             NESTED LOOPS                             |                               |       1 |     3 |       510 |   +159 |    2M |     770K |       |       |       |       |     . |     . |
| 59 |              TABLE ACCESS BY INDEX ROWID             | TEP                           |       1 |     3 |       512 |   +157 |    2M |     770K |  661K |   5GB |       |       |     . |     . |
| 60 |               INDEX UNIQUE SCAN                      | TEP_PK                        |       1 |     2 |       512 |   +157 |    2M |       2M |  2934 |  23MB |       |       |     . |     . |
| 61 |              INDEX RANGE SCAN                        | TLP_IX1                       |       1 |       |       510 |   +159 |  770K |     770K |       |       |       |       |     . |     . |
| 62 |             VIEW                                     |                               |       1 |     1 |       510 |   +159 |  770K |     770K |       |       |       |       |     . |     . |
| 63 |              SORT AGGREGATE                          |                               |       1 |       |       510 |   +159 |  770K |     770K |       |       |       |       |     . |     . |
| 64 |               TABLE ACCESS BY INDEX ROWID            | TPR                           |       1 |     1 |       510 |   +159 |  770K |     770K |       |       |       |       |     . |     . |
| 65 |                INDEX UNIQUE SCAN                     | TPR_PK                        |       1 |       |       510 |   +159 |  770K |     770K |       |       |       |       |     . |     . |
| 66 |            TABLE ACCESS BY INDEX ROWID BATCHED       | TET                           |       1 |     3 |       511 |   +158 |  770K |     770K | 79759 | 623MB |       |       |     . |     . |
| 67 |             INDEX RANGE SCAN                         | TET_Ix1                       |       1 |     2 |       510 |   +159 |  770K |     899K | 15834 | 124MB |       |       |     . |     . |
| 68 |           TABLE ACCESS BY INDEX ROWID BATCHED        | TWE                           |       5 |    34 |       506 |   +163 |  770K |     6965 |  2080 |  16MB |       |       |     . |     . |
| 69 |            INDEX RANGE SCAN                          | TWE_IDX1                      |      35 |     2 |       506 |   +163 |  770K |     6965 |   118 | 944KB |       |       |     . |     . |
| 70 |          TABLE ACCESS BY INDEX ROWID                 | TT                            |       1 |     2 |       506 |   +163 |  6965 |     1459 |   208 |   2MB |       |       |     . |     . |
| 71 |           INDEX UNIQUE SCAN                          | TT_PK                         |       1 |     1 |       506 |   +163 |  6965 |     6965 |       |       |       |       |     . |     . |
| 72 |         INDEX RANGE SCAN                             | TCX_IX2                       |       1 |     2 |       506 |   +163 |  1459 |     1459 |  1388 |  11MB |       |       |     . |     . |
| 73 |        TABLE ACCESS BY INDEX ROWID                   | TC                            |       1 |     2 |       506 |   +163 |  1459 |     1459 |   936 |   7MB |       |       |     . |     . |
| 74 |         INDEX UNIQUE SCAN                            | TC_PK                         |       1 |     1 |       506 |   +163 |  1459 |     1459 |    75 | 600KB |       |       |     . |     . |
| 75 |       TABLE ACCESS BY INDEX ROWID BATCHED            | TLS                           |       1 |     2 |       506 |   +163 |  1459 |     1451 |     1 |  8192 |       |       |     . |     . |
| 76 |        INDEX SKIP SCAN                               | TLS_PK                        |       1 |     1 |       506 |   +163 |  1459 |     1451 |     1 |  8192 |       |       |     . |     . |
| 77 |    SORT AGGREGATE                                    |                               |       1 |       |       512 |   +157 |    2M |       2M |       |       |       |       |     . |     . |
| 78 |     FIRST ROW                                        |                               |       1 |     3 |       512 |   +157 |    2M |       2M |       |       |       |       |     . |     . |
| 79 |      INDEX RANGE SCAN (MIN/MAX)                      | TCX_IX2                       |       1 |     3 |       512 |   +157 |    2M |       2M |  9356 |  73MB |       |       |     . |     . |
=======================================================================================================================================================================================================

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3 (U - Unused (1))
---------------------------------------------------------------------------
   0 -  STATEMENT
         U -  first_rows / hint overridden by another in parent query block
           -  first_rows
 
  56 -  SEL$5
           -  no_merge
 
Note
-----
   - this is an adaptive plan

If you want to pull these plans into separate windows and compare (nothing but) the Operations and Names line by line you’ll find that the only differences appear at operations 45, 47, 66, 68, and 75:

| 45 |        TABLE ACCESS BY INDEX ROWID                   | TP                            |
| 47 |       TABLE ACCESS BY INDEX ROWID                    | TP                            |
| 66 |            TABLE ACCESS BY INDEX ROWID               | TET                           |
| 68 |           TABLE ACCESS BY INDEX ROWID                | TWE                           |
| 75 |       TABLE ACCESS BY INDEX ROWID                    | TLS                           |

| 45 |        TABLE ACCESS BY INDEX ROWID BATCHED           | TP                            |
| 47 |       TABLE ACCESS BY INDEX ROWID BATCHED            | TP                            |
| 66 |            TABLE ACCESS BY INDEX ROWID BATCHED       | TET                           |
| 68 |           TABLE ACCESS BY INDEX ROWID BATCHED        | TWE                           |
| 75 |       TABLE ACCESS BY INDEX ROWID BATCHED            | TLS                           |

So what could possibly make one plan so much slower than the other?

There are all sorts of bits and pieces in these plans that you need to be able to spot “in passing” if you want to become fluent at understanding execution plans. It’s something that takes a lot of practise but there’s one general tip (or warning, perhaps) that I can offer.

If you start out by looking for one particular thing you’ll miss lots of important clues; on a first pass through the plan just try to notice anything that looks a little interesting or informative, then go back for a more detailed examination on a second pass through the plan.

I won’t go through the entire pattern of thought that went through my mind as I started looking at these plans, but here are a couple of flags I raised

  • Adaptive plans in SQL Monitor – we’re likely to see some “statistics collector” operations and that’s the “obvious” source of the anomaly, but reading plans that include their adaptive bits can be a mind-bending experience.
  • Global Stats of the slow one says 984K read requests (compared to 251K for the fast plan) – that might explain the difference in timing – keep an eye out for where the big numbers appear. (NB Don’t, at this point, go looking for them as that may lead you into missing the real issue.)
  • The slow plan plan shows the top operation with a Start Active of +157 while the fast plan has a start active of +2: that’s pretty consistent with a comment the user made (but I hadn’t mentioned) about the response time from the user’s perspective; and it tells us that there’s a blocking operation somewhere in the slow plan. That’s nice because that’s what we might expect from seeing an adaptive plan switching from a nested loop to a hash join. (So we already think we’re a clever bunny – too bad it wasn’t quite the right answer.)
  • There are two places in the plan which report Statistics Collector, both having a Nested Loop Outer then a Hash Join Outer immediately above them as the two candidate consumers for the rowsource supplied by the Statistic Collector’s child operation. (Note: in some cases you will see two Nested Loop operations above a Statistics Collector before you get to the Hash Join, but that’s just because Oracle may implement a nested loop join in two steps, first to the index, second to the table)

First thoughts

Bearing in mind the warning about not digging “too deep too soon”, here are a few potentially significant observations. In the following notes I’ll simply refer to the plans as the “fast” plan and the “slow” plan.

  • The slow plan shows a large hash join with spill to disc at operation 4, and that’s one of the options triggered by the statistics collector at operation 6. It’s a little confusing that the nested loop join following it also reports execution statistics because only one of the two operations would have been doing anything, but we’ll assume for now that that’s a reporting error and postpone making a fuss about it. The same pattern appears at operations 8, 9 and 10.
  • A cross-check to the fast plan shows that the hash joins don’t do any work at the corresponding operations (although it also displays the oddity of operations that (we believe) didn’t actually run reporting a long Time Active.
  • The slow plan has an oddity at operation 41 – it’s an index range scan of a primary key (TCX_PK) that doesn’t report any execution statistics. Cross-checking to the slow plan we see that it’s operation 42 that doesn’t report any execution statistics (and it’s also an index range scan, and using the same index!) . We note that the fast plan “Starts” its range scan 2 million times, while the slow plan starts just once, starting at time +131 and having an active time of 1 second. [side-note: I am a little suspicious of that number – It looks to me as if it ought to be reporting 27 seconds]
  • Keep going – because just a bit further down we see that the slow plan has no stats for operations 45 and 46 (index range scan of TP_PK with table access of TP) while the fast plan has no stats for operations 47 and 48 (also an index range scan of TP_PK with table access to TP). Again we see the same pattern that the slow plan executes the operation just once while the fast plan executes its operations 2M times.
  • Keep going – the previous two observations are interesing and probably need further investigation, but they might not be critical. The very next line (operation 49) in both plans shows us a “TABLE ACCESS STORAGE FULL FIRST ROWS” that executes 2 million times – that’s got to hurt, surely, but let’s not get side-tracked just yet.
  • Operation 50 is a “VIEW PUSHED PREDICATE” – that almost certainly means it’s the second child of a nested loop join with a join predicate pushed into a non-mergeable view (and the view name is TEB_VW so it’s not a view created by an internal transformation) and the operation has, like so many other lines in the plan, started 2 million times.
  • Looking at the rest of the plan, there are no more statistics collectors and the plans have an exact match on operations. Unfortunately we don’t have a Predicate Information section, so we can’t tell whether matching operations were really doing the same thing (e.g. was an index range scan in one plan using more columns to probe the index than the corresponding index range scan in the other plan) however we can check times:
  • The View Pushed Predicate in the fast plan starts at time +24 [Another slightly suspicious time given all the +2 starts in the locale] and is active for 256 seconds, while in the slow plan it starts at time +163 and is active for 506 seconds. So it looks as if a lot of the excess run time of the query time is spent in this part of the plan — for no logical reason that we can see – although the access to table TEP by index TEP_PK does no physical reads in the fast plan and 661K reads in the slow plan.
  • Again we take a quick note, and move on. The final observation is that the last three lines of the plan look like the plan for a subquery block (executed very efficiently) of the “find the most recent / highest / lowest” type, and a quick check to the top of the plan shows that its parent is a FILTER operation, corroborating our first guess.

Starting to dig

Reviewing the first pass we can see that we lose a lot of “startup” time to the two hash joins where the build table in each case has to be completed before any further tables can be joined. This is in the order of 160 seconds, which is consistent with the OP’s observations, and it’s happening because adaptive plans are activated, triggering a change from nested loop joins to hash joins.

More significantly, from the perspective of time, is that the nested loop join into the View Pushed Predicate is active for twice as long in the slow plan as it is in the fast plan – so that’s a place to look a little more closely, revealing that operation 59 is probably the reason for the difference: 661 thousand read requests in the slow plan but none in the fast plan.

Unfortunately we don’t have any Activity Stats (i.e. active session history data) in the report, but since the access to the table is reported as unique access by unique index in both cases we can be fairly sure that the difference isn’t due to a difference in the Predicate Information (that isn’t in the report).

Clearly we need to stop the stop the adaptive plan from being taken to avoid the start-up delay (e.g. add a /*+ no_adaptive_plan */ hint to the query) but that still leaves two puzzles:

  1. why are the rows estimates so bad (and at least part of the reason for that is that it turned out that the query was being optimized with optimizer_mode = first_rows – that’s the legacy first_rows, not a cost-based first_rows_N);
  2. how could the same sub-plan result in far more physical reads in one case compared to the other when the critical operation is a unique index access.

The answer to the second question could be in an observation I first published 14 years ago – and it could indicate a generic threat to adaptive optimisation.

If you have an execution plan which, stripped to a minimum, looks like this:

Join Operation
        Table_X
        Table_Y

The order in which the result appears is likely to change depending on the join mechanism that Oracle chooses, viz Hash Join, Merge Join or Nested Loop Join.

Under “non-adaptive” conditions if you have a join that’s border-line between a hash join and a nested loop join it frequently means that the optimizer will fip flop between two plans like the following (leading to the classic question – nothing changed why did the plan change):

Hash Join
        Table_X
        Table_Y

Nested Loop Join
        Table_Y
        Table_X

Note that the order in which the tables appear is reversed.

As it says in another article of mine: all joins are nested loop joins, just with different startup costs”. In both the plans above Oracle picks a row from Table_Y and looks for a match in Table_X, so the order of the result set is dictated by the Table_Y regardless of whether the join is a hash join or a nested loop join. However, if Oracle has decided to use an adaptive plan and starts with the nested loop (Y -> X) and decides to switch to a hash join it doesn’t swap the join order as the join mechanism is selected, so a result set whose order would have been dictated by Table_Y turns into the same result set (we hope) but in an order dictated by Table_X.

Consequences:

If you’re using very big tables and Oracle produces an adaptive nested loop join early in the plan this may result in a later nested loop being lucky and inducing lots of “self-caching” because its driving rowsource is in a nice order. If the plan adapts to a hash join the driving data set may appear in a completely different order that makes the later nested loop jump randomly around a very large table, inducing a lot of “self-flushing” as one table block is discarded from the buffer cache to make space for another. (I published an article several years ago about how a similar – though far from identical – type of anomaly could happen with Exadata and compression: an unlucky order of data access causing a massive extra workload.)

Conclusion and further thoughts

In this note I’ve tried to present my thoughts as I’ve read through an execution plan trying to understand what it’s doing and why it’s showing the performance characteristics it does.

In this case the interpretation was made harder because the plan was an adaptive plan – and there doesn’t appear to be an option in the procedure in dbms_sql_monitor to choose between hiding and revealing the adaptive parts [ed: this statement may apply only to the text option – see comment #1 for a counter-example using the ‘Active HTML” option]; moreover there was no Activity (ASH) information supplied and we didn’t have the Predicate Information.

The performance “issue” was that when adaptive plans were allowed (as opposed to reported only) we could see that two nested loops changed to hash joins. It was fairly clear that this explained the huge delay before results started to appear, but didn’t explain why the query took so much longer to complete.

We have a hypothesis that the extra run time of the query was due to “bad luck” because we can see very clearly that a nested loop into a non-mergeable view with pushed predicate reports a huge number of single block read requests; and we know that changing a join from a nested loop to a hash join without changing the order of the child operations will change the order in which the join’s rowsource is generated.

Ini this case the query was executing under the legacy first_rows optimizer mode, and it’s possible that if first_rows_N had been used the optimizer would have behaved differently, especially since we have a query that is returning 2M rows and we only want the first few rows.

Next Steps

The obvious “next step” in this investigation is to check whether first_rows_N co-operates nicely with adaptive optimisation. After all, the only significant thing that adaptive optimisation does to (serial) execution plans is set an inflexion point to dictate when a nested loop should change to a hash join – and a hash join is a blocking operation which is rarely a good thing for a first_rows_N plan.

So, does first_rows_N disable this adaptive plan analysis, does it move the inflection point significantly, or does the optimizer simply forget that hash joins are less desirable in first_rows N optimisation. And if you’re running a system in first_rows_N mode should you disable adaptive plans by default, and only enable it for special cases.

I also have an urge to test a couple of ideas about why the two timing anomalies I mentioned have appeared, but it’s already taken me several hours to write notes (including a few replies to the list server) about the 30 minutes I’ve spent looking at an execution plan, so any further experimens will have to go on my to-do list.

December 8, 2020

Hash Joins

Filed under: Hash Join,Joins,Oracle — Jonathan Lewis @ 11:55 am GMT Dec 8,2020

This is a follow-up to a note that’s 10 years old [opens in new tab], prompted by a thread on the Oracle Developer Community forum asking about the order of operation in an execution plan, and following up with a question about PGA memory use that I had left open in the original note.

The original note pointed out that the serial execution plan for a 4 table join that used hash joins for every join and had a leading() hint dictating a single join order could still have 8 distinct execution plans (which I then corrected to 4, because half of them were eliminated by an odd little inconsistency of the leading() hint).

The source of the number 8 was the fact that when you hint a hash join with /*+ use_hash(next_alias) */ the optimizer will consider a /*+ swap_join_inputs(next_alias) */ – in other words, it will choose which of the two inputs should be the “build” table and which the “probe” table, even though the human eye will probably assume that next_alias was clearly intended as the probe table. (The “little inconsistency” in the leading() hint is that it blocks the option for a swap for the first – and only the first – join unless it’s explicitly hinted.)

In the article I showed all 8 possible plans, and described how the two plans at the extreme ends of my list would operate at run-time. I’ve reproduced these two plans below, preceded by the hints that produced them:

leading(t1 t2 t3 t4)
use_hash(t2) no_swap_join_inputs(t2)
use_hash(t3) no_swap_join_inputs(t3)
use_hash(t4) no_swap_join_inputs(t4)
 
--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|*  3 |    HASH JOIN          |      |    70 |  1260 |     8 |
|*  4 |     HASH JOIN         |      |    70 |   840 |     5 |
|   5 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   6 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   7 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|   8 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
--------------------------------------------------------------

leading(t1 t2 t3 t4)
use_hash(t2)    swap_join_inputs(t2)
use_hash(t3)    swap_join_inputs(t3)
use_hash(t4)    swap_join_inputs(t4)
 
--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|   3 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
|*  4 |    HASH JOIN          |      |    70 |  1260 |     8 |
|   5 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|*  6 |     HASH JOIN         |      |    70 |   840 |     5 |
|   7 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   8 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
--------------------------------------------------------------

Side note: one of the comments on the original post raised the point that (e.g.) the second of the plans above could have been generated by the join order represented by the hint leading(t2 t1 t3 t4), and that’s absolutely correct. In fact, if you see the plan for a 4 table join consisting of nothing but 3 hash join you cannot be certain what join order the optimizer was examining when it produced that plan unless you look at the CBO trace file (or check to see if it’s been explicitly and completely, hinted in the code).

Having set the scene, we come to the question that prompted this note. The question related to the second plan above, and said:

“However, this one is the plan that takes the most memory in the PGA because it builds 3 hash tables before joining any table, right?”

The quick answer to the question is: “Not necessarily.”

In the original article I had pointed out that plans following the pattern of the first plan above with N tables and N – 1 joins would have at most two build tables in memory at any one moment while the bottom plan would create N – 1 build tables in memory before any join results could be created. This does rather invite the inference that the bottom plan has to be the one that is going to use most PGA memory, but what I had said in the original article was (emphasis added):

“Notice: the number of in-memory hash (build) tables we have in the first of the 8 plans at any instant after the first join starts is two and (no matter how many tables are involved in this pattern) the number of in-memory hash tables will always be two. The actual size of the two hash tables is a little unpredictable and, as a very crude guideline, you might expect the size to grow as more tables are joined into the result set.

As a thought experiment, consider 3 small dimension tables and one big fact table. If Oracle were to create in-memory hash tables from the three dimension tables and then start scanning the fact table (following the pattern of the second plan above with t1 in the role of the fact table) probing each of the dimension tables in turn, it could deliver the first result row very quickly without requiring more memory to store intermediate results.

Conversely if Oracle were to create a tiny in-memory hash table from the first dimension and probe it with the fact table (following the pattern of the first plan above with t2 in the role of the fact table) Oracle would then have to build a very large in-memory hash table before before probing it with the second dimension table, and as that second join takes place it would be generating a new result set that would become the next big in-memory hash table.

In this thought experiment we would probably find that the optimizer did the right thing without prompting and constructed three tiny in-memory hash tables – but it’s not always so clear-cut, and even the “right” decision can result in very large intermediate build tables (and if those intermediate build tables spill to disc in an Exadata environment the change in performance can be huge).

To finish off, here’s a script to turn the thought experiment into a concrete example (by careful, but deliberately silly, hinting).

rem
rem     Script:         c_treblehash_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem 

create table t1 
as
select
        rownum          id,
        to_char(rownum) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t1 
        add constraint t1_pk primary key(id)
;

create table t2
as
select
        rownum          id,
        to_char(rownum) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t2
        add constraint t2_pk primary key(id)
;

create table t3
as
select
        rownum          id,
        to_char(rownum) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t3
        add constraint t3_pk primary key(id)
;

create table t4
nologging
as
select
        t1.id                   id1,
        t2.id                   id2,
        t3.id                   id3,
        rpad(rownum,10)         small_vc,
        rpad('x',100)           padding
from
        t1, t2, t3
;

connect test_user/test
set linesize 156
set serveroutput off

select
        /*+ 
                leading(t4 t1 t2 t3)
                full(t4) 
                use_hash(t1) full(t1) swap_join_inputs(t1)
                use_hash(t2) full(t2) swap_join_inputs(t2)
                use_hash(t3) full(t3) swap_join_inputs(t3) 
        */
        count(t1.small_vc),
        count(t2.small_vc),
        count(t3.small_vc),
        count(t4.small_vc)
from
        t4,     
        t1,     
        t2,     
        t3
where
        t1.id = t4.id1
and     t2.id = t4.id2
and     t3.id = t4.id3
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

column pga_max_mem format 99,999,999

select pga_max_mem from v$process where addr = (
                select paddr from v$session where sid = (
                        select sid from V$mystat where rownum = 1
                )
        )
;

connect test_user/test
set linesize 156
set serveroutput off

select
        /*+ 
                leading(t4 t1 t2 t3)
                full(t4) 
                use_hash(t1) full(t1) no_swap_join_inputs(t1)
                use_hash(t2) full(t2) no_swap_join_inputs(t2)
                use_hash(t3) full(t3) no_swap_join_inputs(t3) 
        */
        count(t1.small_vc),
        count(t2.small_vc),
        count(t3.small_vc),
        count(t4.small_vc)
from
        t4,     
        t1,     
        t2,     
        t3
where
        t1.id = t4.id1
and     t2.id = t4.id2
and     t3.id = t4.id3
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

column pga_max_mem format 99,999,999

select pga_max_mem from v$process where addr = (
                select paddr from v$session where sid = (
                        select sid from V$mystat where rownum = 1
                )
        )
;

All I’ve done is create three small “dimension” tables of 70 rows each then created a table which is their Cartesian join, which produces a “fact” table of 343,000 rows. Then I’ve written a simple query to join the three dimension tables to the fact table.

I’ve used swap_join_inputs() for all the joins in one version of the query, and no_swap_join_inputs() in the other versions; and I’ve reconnected to the data before running each query to make it easier to see the different impact on the PGA of the two plans.

I’ve use dbms_xplan.display_cursor() to pull the execution plans from memory after running the queries, and since I’ve not set statistics_level to all, or added the hint /*+ gather_plan_statistics */ to the queries the only rowsource execution statistics I’ll get are the PGA usage: estimates and actuals.

The final step of each test reports the maximum PGA memory usage the session reached in the course of the test.

Here are the two plans, each followed by the PGA maximum memory size.

--------------------------------------------------------------------------
| Id  | Operation             | Name | E-Rows |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |        |       |       |          |
|   1 |  SORT AGGREGATE       |      |      1 |       |       |          |
|*  2 |   HASH JOIN           |      |    343K|  2171K|  2171K| 1684K (0)|
|   3 |    TABLE ACCESS FULL  | T3   |     70 |       |       |          |
|*  4 |    HASH JOIN          |      |    343K|  2171K|  2171K| 1681K (0)|
|   5 |     TABLE ACCESS FULL | T2   |     70 |       |       |          |
|*  6 |     HASH JOIN         |      |    343K|  2171K|  2171K| 1711K (0)|
|   7 |      TABLE ACCESS FULL| T1   |     70 |       |       |          |
|   8 |      TABLE ACCESS FULL| T4   |    343K|       |       |          |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T3"."ID"="T4"."ID3")
   4 - access("T2"."ID"="T4"."ID2")
   6 - access("T1"."ID"="T4"."ID1")

PGA_MAX_MEM
-----------
 13,859,925

--------------------------------------------------------------------------
| Id  | Operation             | Name | E-Rows |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |        |       |       |          |
|   1 |  SORT AGGREGATE       |      |      1 |       |       |          |
|*  2 |   HASH JOIN           |      |    343K|    23M|  3727K|   29M (0)|
|*  3 |    HASH JOIN          |      |    343K|    23M|  3727K|   29M (0)|
|*  4 |     HASH JOIN         |      |    343K|    23M|  3667K|   29M (0)|
|   5 |      TABLE ACCESS FULL| T4   |    343K|       |       |          |
|   6 |      TABLE ACCESS FULL| T1   |     70 |       |       |          |
|   7 |     TABLE ACCESS FULL | T2   |     70 |       |       |          |
|   8 |    TABLE ACCESS FULL  | T3   |     70 |       |       |          |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T3"."ID"="T4"."ID3")
   3 - access("T2"."ID"="T4"."ID2")
   4 - access("T1"."ID"="T4"."ID1")

PGA_MAX_MEM
-----------
 52,984,917

As you can see, each of the hash joins in the first plan required roughly 1.7MB of memory. All three hash tables would have been in existence at the same time, giving a total of about 5.1MB of memory for the query. The session’s maximum PGA usage shows up as 13MB, of which 5MB was my basic “startup” PGA, leaving 3MB “unaccounted”.

In comparison, each of the hash joins in the second plan required roughly 29MB, although only two of the hash tables would have been in existence at any one moment. That’s still an allocation of 58MB for the same basic 4 table join. In fact things aren’t quite as bad as they seem in this case since the maximum PGA allocated was only about 52MB (again with 5MB of startup PGA). The apparent contradiction may be due to the way that Oracle allocates PGA in increasing chunks – the 29MB reported may have been the result of the session doing something like: “I’ve reached 23MB, my next allocation will be 4MB, oops, I only needed another 128KB)”

As a final check of activity, I’ve also run a couple of queries against V$sql_workarea – a convenient memory structure you can examine to get some “recent history” of queries that have been using large amount of memory, or spilling to disc. In this case I’ve query the structure by sql_id for the two queries, reporting just a little detail about the last execution and memory usage.

SQL> select operation_id, operation_type, last_memory_used, last_execution  from V$sql_workarea where sql_id = 'dcc01q28gcbmy';

OPERATION_ID OPERATION_TYPE                   LAST_MEMORY_USED LAST_EXECUTION
------------ -------------------------------- ---------------- ----------------------------------------
           2 HASH-JOIN                                 1724416 OPTIMAL
           4 HASH-JOIN                                 1721344 OPTIMAL
           6 HASH-JOIN                                 1752064 OPTIMAL


SQL> select operation_id, operation_type, last_memory_used, last_execution  from V$sql_workarea where sql_id = 'b52uwjz07fwhk';

OPERATION_ID OPERATION_TYPE                   LAST_MEMORY_USED LAST_EXECUTION
------------ -------------------------------- ---------------- ----------------------------------------
           2 HASH-JOIN                                30930944 OPTIMAL
           3 HASH-JOIN                                30945280 OPTIMAL
           4 HASH-JOIN                                30650368 OPTIMAL

As you can see, the view reports each plan operation (with id and type) that needed memory for an SQL workarea – and the numbers confirm the 1.7MB and 29MB reported by the execution plans. (Which is not surprising since it’s exactly these figures that are used to supply the details in the plans.)

You need to be a little cautious with this view in older versions of Oracle – it used to do a “full tablescan” of the library cache, which resulted in a lot of library cache latch activity and could cause some contention if you ran it frequently, but it’s a very useful view for finding queries that are doing unexpectedly large sorts or hash joins, and one you might query occasionally if you see any space management threats in the temporary tablespace.

November 12, 2020

rowsets

Filed under: Execution plans,Infrastructure,Oracle,Performance — Jonathan Lewis @ 12:35 pm GMT Nov 12,2020

Here’s a little demonstration of the benefit of rowsets. It started with a very simple question that arrived in my in-box from Kaley Crum.

  • “Why does the query with the UNION ALL take so much longer than the query without the UNION ALL?”

Here are the two queries – surely they should take virtually the same amount of time.

select count(*)
from (select /*+ no_merge */
             object_id
      from drop_me);


select count(*)
from (select /*+ no_merge */
             object_id
      from drop_me
      --
      union all
      --
      select 1 from dual);

Look closely – the difference between the two queries is just the addition through a UNION ALL of “select 1 from dual”. The first query took about 3.5 seconds to complete (there were 70M rows in the table), and the second took nearly 69 seconds.

Here are the execution plans showing the rowsource execution statistics (and Query Block // Object Alias information) – first the baseline query:

QL_ID  0ph1vfuuxkbqb, child number 0
-------------------------------------
select count(*) from (select /*+ no_merge */              object_id    
  from drop_me)
 
Plan hash value: 3609429292
 
--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name    | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |      1 |        | 26465 (100)|          |      1 |00:00:03.34 |   95896 |  95891 |
|   1 |  SORT AGGREGATE     |         |      1 |      1 |            |          |      1 |00:00:03.34 |   95896 |  95891 |
|   2 |   VIEW              |         |      1 |     70M| 26465   (2)| 00:00:02 |     70M|00:00:03.32 |   95896 |  95891 |
|   3 |    TABLE ACCESS FULL| DROP_ME |      1 |     70M| 26465   (2)| 00:00:02 |     70M|00:00:03.27 |   95896 |  95891 |
--------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$2 / from$_subquery$_001@SEL$1
   3 - SEL$2 / DROP_ME@SEL$2

And then the UNION ALL query:

SQL_ID  0chdajr28y0ub, child number 0
-------------------------------------
select count(*) from (select /*+ no_merge */              object_id    
  from drop_me       --       union all       --       select 1 from
dual)
 
Plan hash value: 3408528233
 
---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name    | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         |      1 |        | 26467 (100)|          |      1 |00:01:18.58 |   95896 |  95891 |
|   1 |  SORT AGGREGATE      |         |      1 |      1 |            |          |      1 |00:01:18.58 |   95896 |  95891 |
|   2 |   VIEW               |         |      1 |     70M| 26467   (2)| 00:00:02 |     70M|00:01:10.84 |   95896 |  95891 |
|   3 |    UNION-ALL         |         |      1 |        |            |          |     70M|00:00:53.13 |   95896 |  95891 |
|   4 |     TABLE ACCESS FULL| DROP_ME |      1 |     70M| 26465   (2)| 00:00:02 |     70M|00:00:19.28 |   95896 |  95891 |
|   5 |     FAST DUAL        |         |      1 |      1 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |
---------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SET$1 / from$_subquery$_001@SEL$1
   3 - SET$1
   4 - SEL$2 / DROP_ME@SEL$2
   5 - SEL$3 / DUAL@SEL$3

This is not a trick question – I’ve got the SQL to create the data set and run the test; and I’ve run the test through 19.3.0.0 and 12.2.0.1 with results very similar to the above. Here’s a slightly modified version of the script – you’ll notice the column name has changed because I’ve used my usual “large data set” generator rather than scaling up a clone of all_objects:

rem
rem     Script:         rowset_kaley.sql
rem     Author:         Kaley Crum / JP Lewis
rem     Dated:          Oct 2020
rem
rem     Last tested 
rem             19.3.0.0
rem

create table drop_me
pctfree 0
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        rownum  n1
from
        generator
cross join
        generator
where rownum <= 7e7
;

alter session set statistics_level = all;

set timing on
set serveroutput off

prompt  ===========================================
prompt  Baseline: Simple query without a union all.
prompt  Runs in 3-4 seconds
prompt  ===========================================

select
         count(*)
from (select /*+ no_merge */
             n1
      from drop_me);

select *
from table(dbms_xplan.display_cursor(format => 'allstats last advanced'));

prompt  ==========================================
prompt  Add a UNION ALL for one record from dual.
Prompt  Runs in over a minute
prompt  ==========================================

pause Press return

select
         count(*)
from (select /*+ no_merge */
             n1
      from drop_me
      --
      union all
      --
      select 1 from dual);

select *
from table(dbms_xplan.display_cursor(format => 'allstats last advanced'));

There are two factors involved in the massive variation in performance. The first factor is one that the test case will report – though I didn’t show it in the output above, the second is that the test case has enabled rowsource execution statistics.

Here’s the big clue – from the Column Projection Information, which is one of the things that appears with the “advanced” format option (or when you add the “projection” format option) in the call to dbms_xplan.display_xxx()

First for the fast query:

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) COUNT(*)[22]
   2 - (rowset=1019)
   3 - (rowset=1019)

The rowsource passes 1,019 rows at a time from the tablescan operation to the view operation and from the view operation to the sort operation, for a total of roughly 70,000 calls for each of those two steps of the plan.

Compare this to the projection information for the slow UNION ALL query:

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) COUNT(*)[22]

I haven’t missed a bit in copying – this really is it. The rows move up the stack one at a time, not in a rowset array. That’s 70,000,000 subroutime calls for each of the two steps of the plan.

I’m sure most of us have heard the mantra “row by row is slow by slow” (or some variant on the theme). This is true all the way down to the internal levels of an execution plan.

Of course 70M calls vs. 70K calls shouldn’t really explain a difference of more than 60 seconds – but when the calls to the O/S for timing information for rowsource execution stats are repeated for each call (or even if it’s only a 1% sample of the calls) down the stack that’s where a lot of time can disappear.

On my laptop, running Oracle 19.3 in a VM, this is how my timing went:

  • Fast query: 1.94 seconds, dropping to 1.79 seconds when I disabled rowsource execution stats.
  • Slow query: 65.91 seconds, dropping to a far less astounding 3.26 seconds when I disabled rowsource execution stats.

So there really is a noticeable difference between row-by-row and array-processing but most of the difference in the original test came from using rowsource execution statistics to measure how much of a difference there would be.

It’s also worth mentioning that this is probably the most extreme case you could produce to show the difference – using the largest possible rowset size with the smallest possible rows when you want to mazimise the gap between (internal) array processing and single row processing – and then processing a very large number of rows. [But isn’t ridiculously large numbers of rows what you do with Exadata?].

If you want further evidence that the difference is due to the rowset size you can always alter session set “_rowsets_enabled”=false; and watch the fast query slow down. It will take about half the time of the slow query as it only has to pass 70M rows up one step of the plan rather than the two steps that are in the UNION ALL plan.

In theory it looks as if you could also restrict the size of the rowset by setting _rowsets_max_rows or _rowsets_target_maxsize, but since the default value for the former is 256 the results above suggest that the settings might be ignored, and when I tried adjusting them at the session level nothing changed.

Another test you could run is to adjust (alter session) the parameter “_rowsource_statistics_sampfreq”, When I set this to 100 the time for the slow query dropped to about 21 seconds (and the fast query – with rowsets disabled – dropped to about 11 seconds).

Footnote

It’s always easy to hit an accidental special case without realising it, so when you’re testing something it’s important to think about what’s special in the example. I came up with two possibilities in this case – dual is always going to be a very special case in any circumstances, and I’ve got a table in my query that is guaranteed to return exactly one row. So I repeated the test with:

  • a real table with one row and primary key in case the reference to dual was disabling rowsets
  • a real table with 100 rows in case this enabled a rowset size of 100

In neither case did Oracle start using array processing.

Footnote 2

One of the patterns I see occasionally follows the form of the slow query above. Some front-end tool executes a select statement that starts with a select {list of desired column headings} from dual so that the front-end code doesn’t have to be programmed to create headings as it receives the data.

Although the impact this would have on rowsets looks like a large percentage of the work done in the database in this case, in more realistic cases it would probably be irrelevant, until someone starts testing with the statistics_level set to all, or _rowsource_execution_statistics explicitly enabled.

There is a related effect, though, from the SQL Monitor feature (which activates automatically for a statement if it’s executing in parallel, or if it’s predicted to take more than 5 seconds to complete). When I added the /*+ monitor */ hint to the two queries (and didn’t fiddle with the rowsets parameter) the fast query averaged 2.26 seconds instead of 1.79 seconds, and the slow query averaged 3.51 seconds instead of 3.26 seconds. I have to say, though, that the variation between executions of the same statement was quite significant compared the variation produced by adding or eliminating the hint.

October 10, 2020

Interval Oddity

Filed under: CBO,Execution plans,Oracle,Partitioning — Jonathan Lewis @ 2:51 pm BST Oct 10,2020

Interval partitioning is a popular strategy for partitioning date-based data. It’s an enhanced variant of range partitioning that allows you to define a starting partition and an interval that should be used to derive the high values for all subsequent partitions – and Oracle doesn’t even have to create intervening partitions if you insert data that goes far beyond the current partition, it automatically creates exactly the right partition (with the correct high_value and correctly inferred lower boundary) for the incoming data and behaves as if the intervening partitions will become available when they’re needed at some later point in time. So no need for DBAs to work out actual partition high_values, no need to ensure that all the partitions you need out into the future have been pre-created, no more batch processes crashing with Oracle error ORA-14400: inserted partition key does not map to any partition.

But there’s a surprising difference between traditional range partitioning and the newer interval partitioning that will increase CPU usage in some cases and may (though I haven’t yet investigated this in sufficient detail to create an example) lead to variations in execution plans.

To demonstrate the difference I’m going to create two tables with the same structure and content then run the same query against them and show you the resulting execution plans. Here’s the code to create and populate the tables:

rem
rem     Script:         interval_or_range.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem
rem     Last tested 
rem             19.3.0.0
rem


create table t_interval(
        order_date      date not null,
        order_id        number(10,0) not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range (order_date) 
interval (numtoyminterval(1,'MONTH'))
subpartition by hash (order_id) subpartitions 4
        (
                partition start_p1 values less than (to_date('01-Jan-2020','dd-mon-yyyy'))
        )
;

create table t_range(
        order_date      date not null,
        order_id        number(10,0) not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range (order_date) 
subpartition by hash (order_id) subpartitions 4
        (
                partition start_p1 values less than (to_date('01-Jan-2020','dd-mon-yyyy')),
                partition start_p2 values less than (to_date('01-Feb-2020','dd-mon-yyyy')),
                partition start_p3 values less than (to_date('01-Mar-2020','dd-mon-yyyy')),
                partition start_p4 values less than (to_date('01-Apr-2020','dd-mon-yyyy')),
                partition start_p5 values less than (to_date('01-May-2020','dd-mon-yyyy')),
                partition start_p6 values less than (to_date('01-Jun-2020','dd-mon-yyyy')),
                partition start_p7 values less than (to_date('01-Jul-2020','dd-mon-yyyy')),
                partition start_p8 values less than (to_date('01-Aug-2020','dd-mon-yyyy')),
                partition start_p9 values less than (to_date('01-Sep-2020','dd-mon-yyyy'))
        )
;

insert into t_range select
        to_date('01-Jan-2020','dd-mon-yyyy') + rownum,
        rownum,
        lpad(rownum,10,'0'),
        lpad('x',100,'x')
from
        all_objects
where
        rownum <= 240 -- > comment to avoid wordpress format issue  
;

insert into t_interval select
        to_date('01-Jan-2020','dd-mon-yyyy') + rownum,
        rownum,
        lpad(rownum,10,'0'),
        lpad('x',100,'x')
from
        all_objects
where
        rownum <= 240 -- > comment to avoid wordpress format issue
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T_RANGE',
                method_opt  => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T_INTERVAL',
                method_opt  => 'for all columns size 1'
        );
end;
/

I’ve used composite partitioned tables in this example but the same anomaly appears with simple partitioning – which you can test by commenting out the “subpartion by ..” lines. The two tables have partitions defined to hold a month’s data. I’ve inserted a few rows into most of the partitions, and ensured that I haven’t accidentally attempted to insert data that falls outside the legal range of the table with the predefined partitions .

To show that the tables are nominally identical here’s the list of partitions with their high values and number of rows:

column table_name       format a15
column partition_name   format a15
column high_value       format a80

break on table_name skip 1

select
        table_name, partition_name, num_rows, high_value
from
        user_tab_partitions
where
        table_name in ('T_INTERVAL','T_RANGE')
order by
        table_name, partition_name
;


ABLE_NAME      PARTITION_NAME    NUM_ROWS HIGH_VALUE
--------------- --------------- ---------- --------------------------------------------------------------------------------
T_INTERVAL      START_P1                 0 TO_DATE(' 2020-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10722              30 TO_DATE(' 2020-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10723              29 TO_DATE(' 2020-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10724              31 TO_DATE(' 2020-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10725              30 TO_DATE(' 2020-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10726              31 TO_DATE(' 2020-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10727              30 TO_DATE(' 2020-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10728              31 TO_DATE(' 2020-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                SYS_P10729              28 TO_DATE(' 2020-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

T_RANGE         START_P1                 0 TO_DATE(' 2020-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P2                30 TO_DATE(' 2020-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P3                29 TO_DATE(' 2020-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P4                31 TO_DATE(' 2020-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P5                30 TO_DATE(' 2020-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P6                31 TO_DATE(' 2020-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P7                30 TO_DATE(' 2020-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P8                31 TO_DATE(' 2020-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                START_P9                28 TO_DATE(' 2020-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA


18 rows selected.

With this setup we can now run a simple query against the two tables using a where clause that is an exact match for the partition definition for the July data:

set serveroutput off

select 
        count(*) 
from 
        t_range 
where   order_date >= date '2020-07-01'  -- to_date('01-Jul-2020','dd-mon-yyyy')
and     order_date <  date '2020-08-01'  -- to_date('01-Aug-2020','dd-mon-yyyy')
/

select * from table(dbms_xplan.display_cursor);

select 
        count(*) 
from 
        t_interval 
where   order_date >= date '2020-07-01'  -- to_date('01-Jul-2020','dd-mon-yyyy')
and     order_date <  date '2020-08-01'  -- to_date('01-Aug-2020','dd-mon-yyyy')
/

select * from table(dbms_xplan.display_cursor);

I’ve left in two versions of the date predicates – the “ANSI” style, and a strictly formatted “to_char()” style that is still the approached used most commonly in Oracle systems. The effect on the execution plans is the same for both variations of the date declaration, and here are the two execution plans – reported from an instance of 19.3.0.0:

---------------------------------------------------------------------------------------------------
| Id  | Operation               | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |         |       |       |   130 (100)|          |       |       |
|   1 |  SORT AGGREGATE         |         |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|         |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
|   3 |    TABLE ACCESS FULL    | T_RANGE |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
---------------------------------------------------------------------------------------------------


------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |            |       |       |   130 (100)|          |       |       |
|   1 |  SORT AGGREGATE         |            |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|            |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
|*  3 |    TABLE ACCESS FULL    | T_INTERVAL |    31 |   248 |   130   (3)| 00:00:01 |     8 |     8 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(("ORDER_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DATE"<TO_DATE(' 2020-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

It’s quite possible that you won’t notice the difference these plans, even though I’ve put them directly one below the other; and it’s very likely that most people wouldn’t think about the difference if they didn’t have the two plans available at the same time to compare.

If you can’t spot the difference it’s because I’ve deliberately arranged them in a way that might fool you into not looking carefully enough.

If you’ve spotted the difference it’s probably because you’ve examined the Predicate Information section. There isn’t one for the the t_range example but there is one for the t_interval example – and it was a little sneaky of me to make it easy for you to assume that the one set of predicates I showed was common to the two plans.

Since the plans have been pulled from memory (v$sql_plan) it seems likely that they are truthful and the query based on the interval partitioning is actually checking every row in the partition against two date values. In my example that’s not going to make much difference to performance, but in a datawarehouse with millions of rows per partition there’s scope for a noticeable increase in CPU between the two queries – especially since the block reads are likely to be direct path, bypassing much of the usual (CPU -intensive) buffer cache activity.

Interestingly when I checked the 10053 trace files for the two queries the CPU cost for the pair was identical. Normally you expect to see at least a little CPU cost (even if it’s only tens of units out of millions) for checking a predicate. This raises the question – does the optimizer allow a cost for the t_range table for an event that isn’t going to happen, or is it reporting an event that isn’t going to happen for the t_interval table. [Edit: easy enough to check by doing the test on a simple heap table that clones the data from that one partition, and checking the calculated cost with and without the predicate]

There’s a further ramification to this anomaly, relating to the question on the Oracle-L list server that prompted the investigation. What happens on Exadata where the predicates can be offloaded to storage?

The original question wasn’t actually about the appearance (or not) of the predicates, it was about an unexpected cardinality estimate for a query involving two equi-partitioned tables, so this note has gone completely off-topic from the question; but another strange detail about the predicates showed up when I suggested the creation of a column group on the join columns. A (redundant) predicate that had been offloaded to storage stopped being offloaded; here, from the original posting with the original tables, are the two different Predicate Information sections that appeared – the first without the column group, the second after the column group had been created and its stats collected:

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("ORDER_DETL"."ORDR_RQST_NUMBER"="ORDER_REQST"."ORDR_RQST_NUMBER" AND
              "ORDER_DETL"."ORDR_RQST_DATE"="ORDER_REQST"."ORDR_RQST_DATE")
   6 - storage("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   8 - storage("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DETL"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DETL"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("ORDER_DETL"."ORDR_RQST_NUMBER"="ORDER_REQST"."ORDR_RQST_NUMBER" AND
              "ORDER_DETL"."ORDR_RQST_DATE"="ORDER_REQST"."ORDR_RQST_DATE")
   6 - storage("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"<TO_DATE(' 2020-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_REQST"."ORDR_RQST_DATE"<TO_DATE(' 2020-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   8 - filter("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DETL"."ORDR_RQST_DATE"<TO_DATE(' 2020-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Notice how the storage() predicate that appears at operation 8 in the first set has disappeared from the second even though operation 6 manages to retain its storage() predicate throughout.

I’ve modelled a further example of odd behaviour using two pairs of tables – one pair using range/hash partitioning the other interval/hash partitioning. I won’t reproduce it here but the problem of redundant predicates appearing and then disappearing gets even stranger.

I haven’t yet produced an example where the unexpected predicate behaviour has affected the costs or cardinalities – but I’ve only spent a couple of hours playing around with well-formed examples: it’s possible that in badly formed examples (e.g. with statistical inconsistencies) the side effect could mean that two notionally identical queries produce different numbers and different plans because they end up with different predicates in the final transformed query.

Footnote

Following an email from David Kurtz, it occurred to me that I should have made it clear that the disappearance of predicates on the partition key is expected behaviour when the predicates are clearly synchronised with the partition boundaries. This is why it’s a good idea to avoid the between operator when dealing with partitions – the content of a partition is “greater than or equal to” the high_value of the previous partition and “strictly less than” its own high_value whereas between expands to “greater than or equal to and less than or equal to.

The predicate behaviour for the interval partitioning is the oddity, the behaviour for the “normal” range partitioning is expected.

October 1, 2020

Index FFS Cost

Filed under: Bugs,CBO,Execution plans,Indexing,Oracle — Jonathan Lewis @ 11:46 am BST Oct 1,2020

There are a number of unexpected issues with the optimizer’s treatment of the index fast full scan, the access path where Oracle ignores the structure of the B-tree and uses multiblock reads to do a brute-force segment scan as if the index were a “skinny table” with a few blocks of irrelevant garbage (i.e. the branch blocks) that could be ignored.

A recent comment made a few days on a blog about the optimizer’s “index-join” access path reminded me that I had a few notes to finish and publish that might help some people address performance issues relating to a couple of little-known fixes for optimizer oddities in this area. Today’s “fix” is one that appeared in 10g (or maybe very late in 9i) but never became the default optimizer behaviour (possibly because it’s one of those “not a bug” types of bug with no “always correct” strategy).

The problem revolves around the optimizer_index_cost_adj parameter (which is another reason why the fix might have been kept out of the standard code base). For index-only execution plans the parameter will apply to the index range scan, full scan, or skip scan, but not to the index fast full scan. Here’s a model to demonstrate this:


rem
rem     Script:         oica_iffs.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2012
rem
rem     Last tested
rem             19.3.0.0
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        lpad(trunc(dbms_random.value(0,10000)),10)      v1,
        rpad('x',20)                                    padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid wordpress format issue
;

create index t1_i1 on t1(v1);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                cascade          => true,
                method_opt       => 'for all columns size 1'
        );
end;
.

select blocks from user_tables where table_name = 'T1';
select leaf_blocks from user_indexes where index_name = 'T1_I1';

I’ve created a fairly narrow table with 1M rows, and created an index on that table on the v1 column. In my test there were 5,219 blocks in the table and 3,068 blocks in the index.

I’m now going to run a query to count the rows in the table where v1 is not null and report its execution plan. The first run will be unhinted, the second run will include a hint that forces an index full scan, and the third run will include a hint to force an index fast full scan.


prompt  ==============
prompt  Unhinted query
prompt  ==============

select
        count(*)
from
        t1
where
        v1 is not null
;

prompt  ==========
prompt  Index hint
prompt  ==========

select
        /*+ index(t1) */
        count(*)
from
        t1
where
        v1 is not null
;

prompt  ==============
prompt  index_ffs hint
prompt  ==============

select
        /*+ index_ffs(t1) */
        count(*)
from
        t1
where
        v1 is not null
;

Here are the three execution plans. As you might have predicted the default plan (for my Oracle 19.3) is the index fast full scan:


==============
Unhinted query
==============
-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |   450  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|   450  (15)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1" IS NOT NULL)

==========
Index hint
==========
--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |    11 |  3159   (3)| 00:00:01 |
|   1 |  SORT AGGREGATE  |       |     1 |    11 |            |          |
|*  2 |   INDEX FULL SCAN| T1_I1 |  1000K|    10M|  3159   (3)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1" IS NOT NULL)

==============
index_ffs hint
==============
-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |   450  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|   450  (15)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1" IS NOT NULL)

Since the size of the index is less than the size of the table the cost of the index fast full scan will be lower than the cost of a tablescan; and since an index fast full scan does multiblock reads while the index full scan is assumed to use single block reads the cost of the index fast full scan will be lower than the cost of the index full scan.

Now we set the optimizer_index_cost_adj to 1 (as happens in a number of “legacy” commercial products) and repeat the experiment. Here are the three plans (reduced to just the body of the plan):

==================
OICA = 1, Unhinted
==================
--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |    11 |    32   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE  |       |     1 |    11 |            |          |
|*  2 |   INDEX FULL SCAN| T1_I1 |  1000K|    10M|    32   (4)| 00:00:01 |
--------------------------------------------------------------------------

====================
OICA = 1, index hint
====================
--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |    11 |    32   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE  |       |     1 |    11 |            |          |
|*  2 |   INDEX FULL SCAN| T1_I1 |  1000K|    10M|    32   (4)| 00:00:01 |
--------------------------------------------------------------------------

========================
OICA = 1, index_ffs hint
========================
-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |   450  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|   450  (15)| 00:00:01 |
-------------------------------------------------------------------------------

The default plan is now the index full scan, not the index fast full scan; and when we look at the two hinted plans we can see why. The full scan cost has been reduced to 1% of the original cost (as dictated by the setting for optimizer_index_cost_adj); the fast full scan cost has not been adjusted.

So is this a good thing or a bad thing? Different people may answer that question differently. There are clearly likely to be cases where changing the setting for the optimizer_index_cost_adj (which you shouldn’t have been doing anyway for the last 20 years or so) will result in some plans switching from fast full scans to full scans with unfortunate consequences;

On the other hand when you do a fast full scan it may do direct path reads, while the full scan will do cached reads, and the cache may be holding a lot of the index leaf blocks already, so some people might prefer the current behaviour;

On the other, other hand an Exadata system can use storage indexes when doing direct path segment scans (even when the segment is an index) so an Exadata fast full scan might be far more desirable than a partially cached full scan.

So no generic right answer, only specific answers for particular sets of circumstances.

Solutions

This issue was addressed some time in the 10g timeline- my notes say the earliest successful tests I did of the fix were on 10.2.0.4, and that it wasn’t available in 9.2.0.8. There are two possibilities – a fix control, and an event.

  • Fix control: 4483286 – descibed as “Discount FFS cost using optimizer_index_cost_adj”
  • Event: 38085 at level one. I don’t think this event has ever appeared in the oraus.msg file, but it is referenced by bug 4483286

Of the two, setting the fix control is probably the safer option since the event is (apparently) non-existent and there’s always the possibility that the number will be hi-jacked in a future release to do something completely different. It’s also worth noting that the fix control can be applied to an individual statement through the /*+ opt_param() */ hint (there’s an example of the syntax towards the end of another article of mine).

A little demonstration:


-- alter session set events '38085 trace name context forever, level 1';
alter session set "_fix_control"='4483286:1';
alter session set optimizer_index_cost_adj = 1;

set autotrace traceonly explain

prompt  ==============
prompt  Unhinted query
prompt  ==============

select
        count(*)
from
        t1
where
        v1 is not null
;

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |     4   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|     4   (0)| 00:00:01 |
-------------------------------------------------------------------------------

With the event or _fix_control set the cost of the index fast full scan drops to 1%, and the path reappears without being hinted.

September 30, 2020

Serial Bloom

Filed under: Execution plans,Oracle,Parallel Execution,Performance — Jonathan Lewis @ 1:35 pm BST Sep 30,2020

Following the recent note I wrote about an enhancement to the optimizer’s use of Bloom filters, I received a question by email asking about the use of Bloom filters in serial execution plans:

I’m having difficulty understanding the point of a Bloom filter when used in conjunction with a hash join where everything happens within the same process.

I believe you mentioned in your book (Cost Based Oracle) that hash joins have a mechanism similar to a Bloom filter where a row from the probe table is checked against a bitmap, where each hash table bucket is indicated by a single bit. (You have a picture on page 327 of the hash join and bitmap, etc).

The way that bitmap looks and operates appears to be similar to a Bloom filter to me…. So it looks (to me) like hash joins have a sort of “Bloom filter” already built into them.

My question is… What is the advantage of adding a Bloom filter to a hash join if you already have a type of Bloom filter mechanism thingy built in to hash joins?

I can understand where it would make sense with parallel queries having to pass data from one process to another, but if everything happens within the same process I’m just curious where the benefit is.

 

The picture on page 327 of CBO-F is a variation on the following, which is the penultimate snapshot of the sequence of events in a multi-pass hash join. The key feature is the in-memory bitmap at the top of the image describing which buckets in the (partitioned and spilled) hash table hold rows from the build table. I believe that it is exactly this bitmap that is used as the Bloom filter.

The question of why it might be worth creating and using a Bloom filter in a simple serial hash join is really a question of scale. What is the marginal benefit of the Bloom filter when the basic hash join mechanism is doing all the hash arithmetic and comparing with a bitmap anyway?

If the hash join is running on an Exadata machine then the bitmap can be passed as a predicate to the cell servers and the hash function can be used at the cell server to minimise the volume of data that has to be passed back to the database server – with various optimisations dependent on the version of the Exadata software. Clearly minimising traffic through the interconnect is going to have some benefit.

Similarly, as the email suggests, for a parallel query where (typically) one set of parallel processes will read the probe table and distribute the data to the second set of parallel processes which then do the hash join it’s clearly sensible to allow the first set of procsses to apply the hash function and discard as many rows as possible before distributing the survivors – minimising inter-process communication.

In both these cases, of course, there’s a break point to consider of how effective the Bloom filter needs to be before it’s worth taking advantage of the technology. If the Bloom filter allows 99 rows out of every hundred to be passed to the database server / second set of parallel processes then Oracle has executed the hash function and checked the bitmap 100 times to avoid sending one row (and it will (may) have to do the same hash function and bitmap check again to perform the hash join); on the other hand if the Bloom filter discards 99 rows and leaves only one row surviving then that’s a lot of traffic eliminated – and that’s likely to be a good thing. This is why there are a few hidden parameters defining the boundaries of when Bloom filters should be used – in particular there’s a parameter “_bloom_filter_ratio” which defaults to 35 and is, I suspect, a figure which says something like “use Bloom filtering only if it’s expected to reduce the probe data to 35% of the unfiltered size”.

So the question then becomes: “how could you benefit from a serial Bloom filter when it’s the same process doing everything and there’s no “long distance” traffic going on between processes?” The answer is simply that we’re operating at a much smaller scale. I’ve written blog notes in the past where the performance of a query depends largely on the number of rows that are passed up a query plan before being eliminated (for example here, where the volume of data moving results in a significant fraction of the total time).

If you consider a very simple hash join its plan is going to be shaped something like this:


-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    45 |   720 |    31 |
|*  1 |  HASH JOIN         |      |    45 |   720 |    31 |
|*  2 |   TABLE ACCESS FULL| T2   |    15 |   120 |    15 |
|   3 |   TABLE ACCESS FULL| T1   |  3000 | 24000 |    15 |
-----------------------------------------------------------

If you read Tanel Poder’s article on execution plans as a tree of Oracle function calls you’ll appreciate that you could translate this into informal English along the lines of:

  • Operation 1 calls a function (at operation 2) to do a tablescan of t1 and return all the relevant rows, building an in-memory hash table by applying a hashing function to the join column(s) of each row returned by the call to the tablescan. As the hash table is populated the operation also constructs a bitmap to flag buckets in the hash table that have been populated.
  • Operation 1 then calls a function (at operation 3) to start a tablescan and then makes repeated calls for it to return one row (or, in newer versions, a small rowset) at a time from table t2. For each row returned operation 1 applies the same hash function to the join column(s) and checks the bitmap to see if there’s a potential matching row in the relevant bucket of the hash table, and if there’s a potential match Oracle examines the actual contents of the bucket (which will be stored as a linked list) to see if there’s an actual match.

Taking the figures above, let’s imagine that Oracle is using a rowset size of 30 rows. Operation 1 will have to make 100 calls to Operation 3 to get all the data, and call the hashing function 3,000 times.  A key CPU component of the work done is that the function represented by operation 3 is called 100 times and (somehow) allocates and fills an array of 30 entries each time it is called.

Now assume operation 1 passes the bitmap to operation 3 as an input and it happens to be a perfect bitmap. Operation 3 starts its tablescan and will call the hash function 3,000 times, but at most 45 rows will get past the bitmap. So operation 1 will only have to call operation 3 twice.  Admittedly operation 1 will (possibly) call the hash function again for each row – but maybe operation 3 will supply the hash value in the return array. Clearly there’s scope here for a trade-off between the reduction in work due to the smaller number of calls and the extra work needed to take advantage of the bitmap technology.

Here’s an example that shows the potential for savings – if you want to recreate this test you’ll need about 800MB of free space in the database, the first table takes about 300MB and the second about 450MB.


rem
rem     Script:         bloom_filter_serial_02.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        lpad(rownum,30,'0')             v1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e7 -- > comment to avoid WordPress format issue
;

create table t2
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        round(rownum + 0.5,2)           id,
        mod(rownum,1e5)                 n1,
        lpad(rownum,10)                 v1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e7 -- > comment to avoid WordPress format issue
;


prompt  =================
prompt  With Bloom filter
prompt  =================

select 
        /*+ 
                px_join_filter(t1) 
                monitor
        */
        t1.v1, t2.v1
from 
        t2, t1
where 
        t2.n1 = 0
and 
        t1.id = t2.id
/

prompt  ===============
prompt  No Bloom filter
prompt  ===============

select 
        /*+
                monitor
        */
        t1.v1, t2.v1
from 
        t2, t1
where 
        t2.n1 = 0
and 
        t1.id = t2.id
/

I’ve created tables t1 and t2 with an id column that never quite matches, but the range of values is set so that the optimizer thinks the two tables might have a near-perfect 1 to 1 match. I’ve given t2 an extra column with 105 distinct values in its 107 rows, so it’s going to have 100 rows per distinct value. Then I’ve presented the optimizer with a query that looks as if it’s going to find 100 rows in t2 and needs to find a probable 100 rows of matches in t1. For my convenience, and to highlight a couple of details of Bloom filters, it’s not going to find any matches.

In both runs I’ve enabled the SQL Monitor feature with the /*+ monitor */ hint, and in the first run I’ve also hinted the use of a Bloom filter. Here are the resulting SQL Monitor outputs. Bear in mind we’re looking at a reasonably large scale query – volume of input data – with a small result set.

First without the Bloom filter:


Global Stats
================================================================
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
================================================================
|    3.00 |    2.24 |     0.77 |     1 |  96484 |  773 | 754MB |
================================================================

SQL Plan Monitoring Details (Plan Hash Value=2959412835)
==================================================================================================================================================
| Id |      Operation       | Name |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  |  Mem  | Activity | Activity Detail |
|    |                      |      | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes | (Max) |   (%)    |   (# samples)   |
==================================================================================================================================================
|  0 | SELECT STATEMENT     |      |         |       |         2 |     +2 |     1 |        0 |      |       |     . |          |                 |
|  1 |   HASH JOIN          |      |     100 | 14373 |         2 |     +2 |     1 |        0 |      |       |   2MB |          |                 |
|  2 |    TABLE ACCESS FULL | T2   |      99 |  5832 |         2 |     +1 |     1 |      100 |  310 | 301MB |     . |          |                 |
|  3 |    TABLE ACCESS FULL | T1   |     10M |  8140 |         2 |     +2 |     1 |      10M |  463 | 453MB |     . |          |                 |
==================================================================================================================================================

According to the Global Stats the query has taken 3 seconds to complete, of which 2.24 seconds is CPU. (The 750MB read in 0.77 second would be due to the fact that I’m running off SSD, and I’ve got a 1MB read size that helps). A very large fraction of the CPU appears because of the number of calls from operation 1 to operation 3 (the projection information pulled from memory reports a rowset size of 256 rows, so that’s roughly 40,000 calls to the function.

When we force the use of a Bloom filter the plan doesn’t change much (though the creation and use of the Bloom filter has to be reported) – but the numbers do change quite significantly.

Global Stats
================================================================
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
================================================================
|    1.97 |    0.99 |     0.98 |     1 |  96484 |  773 | 754MB |
================================================================

SQL Plan Monitoring Details (Plan Hash Value=4148581417)
======================================================================================================================================================
| Id |       Operation       |  Name   |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  |  Mem  | Activity | Activity Detail |
|    |                       |         | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes | (Max) |   (%)    |   (# samples)   |
======================================================================================================================================================
|  0 | SELECT STATEMENT      |         |         |       |         1 |     +1 |     1 |        0 |      |       |     . |          |                 |
|  1 |   HASH JOIN           |         |     100 | 14373 |         1 |     +1 |     1 |        0 |      |       |   1MB |          |                 |
|  2 |    JOIN FILTER CREATE | :BF0000 |      99 |  5832 |         1 |     +1 |     1 |      100 |      |       |     . |          |                 |
|  3 |     TABLE ACCESS FULL | T2      |      99 |  5832 |         1 |     +1 |     1 |      100 |  310 | 301MB |     . |          |                 |
|  4 |    JOIN FILTER USE    | :BF0000 |     10M |  8140 |         1 |     +1 |     1 |    15102 |      |       |     . |          |                 |
|  5 |     TABLE ACCESS FULL | T1      |     10M |  8140 |         1 |     +1 |     1 |    15102 |  463 | 453MB |     . |          |                 |
======================================================================================================================================================


In this case, the elapsed time dropped to 1.97 seconds (depending on your viewpoint that’s either a drop of “only 1.03 seconds” or drop of “an amazing 34.3%”; with the CPU time dropping from 2.24 seconds to 0.99 seconds (55.8% drop!)

In this case you’ll notice that the tablescan of t1 produced only 15,102 rows to pass up to the hash join at operation 1 thanks to the application of the predicate (not reported here): filter(SYS_OP_BLOOM_FILTER(:BF0000,”T1″.”ID”)). Instead of 40,000 calls for the next rowset the hash function has been applied during the tablescan and operation 5 has exhausted the tablescan after only about 60 calls. This is what has given us the (relatively) significant saving in CPU.

This example of the use of a Bloom filter highlights up the two points I referred to earlier.

  • First, although we see operations 4 and 5 as Join (Bloom) filter use and Table access full respectively I don’t think the data from the tablescan is being “passed up” from operation 5 to 4; I believe operation 4 can be views as a “placeholder” in the plan to allow us to see the Bloom filter in action, the hashing and filtering actually happening during the tablescan.
  • Secondly, we know that there are ultimately no rows in the result set, yet the application of the Bloom filter has not eliminated all the data. Remember that the bitmap that Oracle constructs of the hash table identifies used buckets, not actual values. Those 15,102 rows are rows that “might” find a match in the hash table because they belong in buckets that are flagged. A Bloom filter won’t discard any data that is needed, but it might fail to eliminate data that subsequently turns out to be unwanted.

How parallel is parallel anyway?

I’ll leave you with one other thought. Here’s an execution plan from 12c (12.2.0.1) which joins three dimension tables to a fact table. There are 343,000 rows in the fact table and the three joins individually identify about 4 percent of the data in the table. In a proper data warehouse we might have been looking at a bitmap star transformation solution for this query, but in a mixed system we might want to run warehouse queries against normalised data – this plan shows what Bloom filters can do to minimise the workload. The plan was acquired from memory after enabling rowsource execution statistics:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |      1 |        |        |      |            |      1 |00:00:00.05 |      22 |      3 |       |       |          |
|   1 |  SORT AGGREGATE              |          |      1 |      1 |        |      |            |      1 |00:00:00.05 |      22 |      3 |       |       |          |
|   2 |   PX COORDINATOR             |          |      1 |        |        |      |            |      2 |00:00:00.05 |      22 |      3 | 73728 | 73728 |          |
|   3 |    PX SEND QC (RANDOM)       | :TQ10000 |      0 |      1 |  Q1,00 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   4 |     SORT AGGREGATE           |          |      2 |      1 |  Q1,00 | PCWP |            |      2 |00:00:00.09 |    6681 |   6036 |       |       |          |
|*  5 |      HASH JOIN               |          |      2 |     26 |  Q1,00 | PCWP |            |     27 |00:00:00.09 |    6681 |   6036 |  2171K|  2171K|     2/0/0|
|   6 |       JOIN FILTER CREATE     | :BF0000  |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|*  7 |        TABLE ACCESS FULL     | T3       |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|*  8 |       HASH JOIN              |          |      2 |    612 |  Q1,00 | PCWP |            |     27 |00:00:00.08 |    6634 |   6026 |  2171K|  2171K|     2/0/0|
|   9 |        JOIN FILTER CREATE    | :BF0001  |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|* 10 |         TABLE ACCESS FULL    | T2       |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|* 11 |        HASH JOIN             |          |      2 |  14491 |  Q1,00 | PCWP |            |     27 |00:00:00.08 |    6614 |   6022 |  2171K|  2171K|     2/0/0|
|  12 |         JOIN FILTER CREATE   | :BF0002  |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|* 13 |          TABLE ACCESS FULL   | T1       |      2 |      3 |  Q1,00 | PCWP |            |      6 |00:00:00.01 |      20 |      4 |       |       |          |
|  14 |         JOIN FILTER USE      | :BF0000  |      2 |    343K|  Q1,00 | PCWP |            |     27 |00:00:00.08 |    6594 |   6018 |       |       |          |
|  15 |          JOIN FILTER USE     | :BF0001  |      2 |    343K|  Q1,00 | PCWP |            |     27 |00:00:00.08 |    6594 |   6018 |       |       |          |
|  16 |           JOIN FILTER USE    | :BF0002  |      2 |    343K|  Q1,00 | PCWP |            |     27 |00:00:00.08 |    6594 |   6018 |       |       |          |
|  17 |            PX BLOCK ITERATOR |          |      2 |    343K|  Q1,00 | PCWC |            |     27 |00:00:00.08 |    6594 |   6018 |       |       |          |
|* 18 |             TABLE ACCESS FULL| T4       |     48 |    343K|  Q1,00 | PCWP |            |     27 |00:00:00.05 |    6594 |   6018 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

It’s a parallel plan, but it’s used the 12c “PQ_REPLICATE” strategy. The optimizer has decided that all the dimension tables are so small that it’s going to allow every PX process to read every (dimension) table through the buffer cache and build its own hash tables from them. (In earlier versions you might have seen the query coordinator scanning and broadcasting the three small tables, or one set of PX processes scanning and broadcasting to the other set).

So every PX process has an in-memory hash table of all three dimension tables and then (operation 17) they start a tablescan of the fact table, picking non-overlapping rowid ranges to scan. But since they’ve each created three in-memory hash tables they’ve also been able to create three Bloom filters each, which can all be applied simultaneously as the tablescan takes place; so instead of 343,000 rows being passed up the plan and through the first hash join (where we see from operation 11 that the number of surviving rows would have been about 14,500 ) we see all but 27 rows discarded very early on in the processing. Like bitmap indexes part of the power of Bloom filters lies in the fact that with the right plan the optimizer can combine them and identify a very small data set very precisely, very early.

The other thing I want you to realise about this plan, though, is that it’s not really an “extreme” parallel plan. It’s effectively running as a set of concurrent, non-interfering, serial plans. Since I was running (parallel 2) Oracle started just 2 PX processes: they both built three hash tables from the three dimension tables then split the fact table in half and took half each to do all the joins, and passed the nearly complete result to the query co-ordinator at the last moment. That’s as close as you can get to two serial, non-interfering, queries and still call it a parallel query. So, if you wonder why there might be any benefit in serial Bloom filters – Oracle’s actually being benefiting from it under the covers for several years.

Summary

Bloom filters trade a decrease in messaging against an increase in preparation and hashing operations. For Exadata systems with predicate offloading it’s very easy to see the potential benefit; for general parallel execution; it’s also fairly easy to see the potential benefit for parallel query execution what inter-process message between two sets of PX processes can be resource intensive; but even for serial queries there can be some benefit though, in absolute terms, they are likely to be only a small saving in CPU.

 

September 9, 2020

Bloom Upgrade

Filed under: 18c,CBO,Joins,Oracle,Tuning — Jonathan Lewis @ 5:18 pm BST Sep 9,2020

It’s a common pattern of Oracle features that they start with various restrictions or limitations that disappear over time. This note is about an enhancement to Bloom filter processing that appeared in the 18.1 optimizer and, for some people, may be a good enough reason for upgrading to a newer version of Oracle. This enhancement came to my attention by way of the Oracle Developer forum in a thread with the title Bloom filters and view using UNION ALL asking how to get a Bloom filter pushed inside a UNION ALL view. The original requirement wasn’t a trivial one so I’ll demonstrate the problem with a very simple example – first the data set:

rem
rem     Script:         bloom_pushdown.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2020
rem
rem     Last tested:
rem             19.3.0.0
rem

create table t1 as select * from all_objects where rownum <= 50000;
create table t2 as select t1.* from t1, (select rownum n1 from dual connect by level <= 4);
create table t3 as select t1.* from t1, (select rownum n1 from dual connect by level <= 4); -- > comment to avoid wordpress format issue

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns object_type size 254'
        );
end;
/

I’ve been a bit lazy here, copying data from view all_objects. I’ve gathered stats on t1 so that I can generate a histogram on the object_type column because I’m going to query for a rare object_type and I want the optimizer to get a reasonable estimate of rows. I’m going to hint a parallel query to join t1 to t2 (aliased, trivially, as v1 for reasons that will become apparent soon):

select
        /*+ 
                parallel(2) 
        */
        t1.object_name, v1.object_name
from
        t1,
        t2 v1
where
        t1.object_type = 'SCHEDULE'
and     v1.object_id = t1.object_id
/

In my case the optimizer chooses to do a hash join between these two table, and creates a Bloom filter to try and minimise the data passing through the data flow operation. The result set in my 12.2.0.1 database is only 16 rows, so it would be nice if the parallel scan could eliminate most of the 200,000 rows in t2 early – here’s the execution plan pulled from memory after running the query with rowsource execution stats enabled:


----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |      1 |        |   371 (100)|     16 |00:00:00.06 |      20 |      0 |       |       |          |
|   1 |  PX COORDINATOR        |          |      1 |        |            |     16 |00:00:00.06 |      20 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)  | :TQ10000 |      0 |     16 |   371   (5)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  3 |    HASH JOIN           |          |      2 |     16 |   371   (5)|     16 |00:00:00.05 |    6278 |   3988 |  1250K|  1250K|     2/0/0|
|   4 |     JOIN FILTER CREATE | :BF0000  |      2 |      4 |    75   (4)|      8 |00:00:00.01 |    2034 |      0 |       |       |          |
|*  5 |      TABLE ACCESS FULL | T1       |      2 |      4 |    75   (4)|      8 |00:00:00.01 |    2034 |      0 |       |       |          |
|   6 |     JOIN FILTER USE    | :BF0000  |      2 |    200K|   292   (4)|     16 |00:00:00.04 |    4244 |   3988 |       |       |          |
|   7 |      PX BLOCK ITERATOR |          |      2 |    200K|   292   (4)|     16 |00:00:00.04 |    4244 |   3988 |       |       |          |
|*  8 |       TABLE ACCESS FULL| T2       |     32 |    200K|   292   (4)|     16 |00:00:00.03 |    4244 |   3988 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   5 - filter("T1"."OBJECT_TYPE"='SCHEDULE')
   8 - access(:Z>=:Z AND :Z<=:Z)
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"V1"."OBJECT_ID"))

We see that Oracle has generated a Bloom filter at operation 4 from the data returned from t1 at operation 5, and then used that Bloom filter at operation 6 to eliminate most of the data from t2 before passing the remaining few rows up to the hash join.

Let’s make the query more interesting – what if you want to use a UNION ALL of t2 and t3 in the query (for example one might be “current data” while the other is “historic data”. Here’s the query and plan from 12.2.0.1:

select
        /*+ 
                parallel(2) 
        */
        t1.object_name, v1.object_name
from
        t1,
        (select * from t2 union all select * from t3) v1
where
        t1.object_type = 'SCHEDULE'
and     v1.object_id = t1.object_id
/

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |        |   667 (100)|     32 |00:00:00.37 |      40 |      0 |       |       |          |
|   1 |  PX COORDINATOR         |          |      1 |        |            |     32 |00:00:00.37 |      40 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000 |      0 |     32 |   667   (5)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  3 |    HASH JOIN            |          |      1 |     32 |   667   (5)|     32 |00:00:00.34 |    5125 |   3860 |  1250K|  1250K|     2/0/0|
|*  4 |     TABLE ACCESS FULL   | T1       |      2 |      4 |    75   (4)|      8 |00:00:00.01 |    2034 |      0 |       |       |          |
|   5 |     VIEW                |          |      2 |    400K|   584   (4)|    400K|00:00:00.52 |    8488 |   7976 |       |       |          |
|   6 |      UNION-ALL          |          |      2 |        |            |    400K|00:00:00.24 |    8488 |   7976 |       |       |          |
|   7 |       PX BLOCK ITERATOR |          |      2 |    200K|   292   (4)|    200K|00:00:00.11 |    4244 |   3988 |       |       |          |
|*  8 |        TABLE ACCESS FULL| T2       |     32 |    200K|   292   (4)|    200K|00:00:00.07 |    4244 |   3988 |       |       |          |
|   9 |       PX BLOCK ITERATOR |          |      2 |    200K|   292   (4)|    200K|00:00:00.11 |    4244 |   3988 |       |       |          |
|* 10 |        TABLE ACCESS FULL| T3       |     32 |    200K|   292   (4)|    200K|00:00:00.03 |    4244 |   3988 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   4 - filter("T1"."OBJECT_TYPE"='SCHEDULE')
   8 - access(:Z>=:Z AND :Z<=:Z)      -- > edit to avoid wordpress format issue
  10 - access(:Z>=:Z AND :Z<=:Z)      -- > edit to avoid wordpress format issue

No Bloom filter – so all 400,000 rows feed up the plan and through the hash join. This won’t matter too much for my sub-second tiny data set but on a pair of 50GB tables, with the potential to offload the Bloom filter to storage in Exadata and, perhaps, eliminate 99% of the data at the cell servers, this could make a huge difference to performance.

Since Bloom filters are all about hashing data (in Oracle the standard Bloom filter is the bitmap summarising the build table in a hash join) let’s trying pushing the optimizer into a hash distribution for the parallel join to see if that had any effect:


select
        /*+ 
                parallel(2) 
                gather_plan_statistics
                leading(@sel$1 t1@sel$1 v1@sel$1)
                use_hash(@sel$1 v1@sel$1)
                pq_distribute(@sel$1 v1@sel$1 hash hash)
        */
        t1.object_name, v1.object_name
from
        t1,
        (select * from t2 union all select * from t3) v1
where
        t1.object_type = 'SCHEDULE'
and     v1.object_id = t1.object_id
/

---------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |      1 |        |   667 (100)|     32 |00:00:00.43 |      60 |      0 |       |       |          |
|   1 |  PX COORDINATOR             |          |      1 |        |            |     32 |00:00:00.43 |      60 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)       | :TQ10002 |      0 |     32 |   667   (5)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  3 |    HASH JOIN BUFFERED       |          |      1 |     32 |   667   (5)|     32 |00:00:00.38 |    4000 |   3752 |  2290K|  2082K|     2/0/0|
|   4 |     JOIN FILTER CREATE      | :BF0000  |      2 |      4 |    75   (4)|      8 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX RECEIVE             |          |      2 |      4 |    75   (4)|      8 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       PX SEND HYBRID HASH   | :TQ10000 |      0 |      4 |    75   (4)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   7 |        STATISTICS COLLECTOR |          |      2 |        |            |      4 |00:00:00.01 |    1517 |      0 |       |       |          |
|   8 |         PX BLOCK ITERATOR   |          |      2 |      4 |    75   (4)|      4 |00:00:00.01 |    1517 |      0 |       |       |          |
|*  9 |          TABLE ACCESS FULL  | T1       |     26 |      4 |    75   (4)|      4 |00:00:00.01 |    1517 |      0 |       |       |          |
|  10 |     PX RECEIVE              |          |      2 |    400K|   584   (4)|     66 |00:00:00.77 |    8488 |   7976 |       |       |          |
|  11 |      PX SEND HYBRID HASH    | :TQ10001 |      2 |    400K|   584   (4)|     66 |00:00:00.77 |    8488 |   7976 |       |       |          |
|  12 |       JOIN FILTER USE       | :BF0000  |      2 |    400K|   584   (4)|     66 |00:00:00.77 |    8488 |   7976 |       |       |          |
|  13 |        VIEW                 |          |      2 |    400K|   584   (4)|    400K|00:00:00.68 |    8488 |   7976 |       |       |          |
|  14 |         UNION-ALL           |          |      2 |        |            |    400K|00:00:00.59 |    8488 |   7976 |       |       |          |
|  15 |          PX BLOCK ITERATOR  |          |      2 |    200K|   292   (4)|    200K|00:00:00.18 |    4244 |   3988 |       |       |          |
|* 16 |           TABLE ACCESS FULL | T2       |     32 |    200K|   292   (4)|    200K|00:00:00.06 |    4244 |   3988 |       |       |          |
|  17 |          PX BLOCK ITERATOR  |          |      2 |    200K|   292   (4)|    200K|00:00:00.12 |    4244 |   3988 |       |       |          |
|* 18 |           TABLE ACCESS FULL | T3       |     32 |    200K|   292   (4)|    200K|00:00:00.08 |    4244 |   3988 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   9 - access(:Z>=:Z AND :Z<=:Z)   -- > edit to avoid wordpress format issue 
       filter("T1"."OBJECT_TYPE"='SCHEDULE') 
  16 - access(:Z>=:Z AND :Z<=:Z)   -- > edit to avoid wordpress format issue
  18 - access(:Z>=:Z AND :Z<=:Z)   -- > edit to avoid wordpress format issue

We’ve managed to introduce a Bloom filter (which is visible as :BF0000 in the plan, even through there’s no reference to sys_op_bloom_filter() in the predicate information) but there’s a problem, we’re still passing 400,000 rows up the plan and the Bloom filter is only being applied at (or just after) the VIEW operator, discarding all but 66 rows before doing the hash join. It’s an improvement but not ideal; we’d like to see the Bloom filter applied to each of the two tables separately to eliminate rows as early as possible.

This can’t be done in 12.2, and you’d have to rewrite the query, changing a “join with union” into a “union of joins”, and that’s not really a desirable strategy.

Next Steps

Searching MOS, though you will be able to find the following note:

Doc ID 18849313.8 – ENH : bloom filters/pruning are pushed through union-all view

There’s an enhancement request to do what we want in 18.1, and the enhancement has got into the software. Here’s the (unhinted) plan from 19.3 (the plan stays the same when optimizer_features_enable is set back to 18.1.0, but drops back to the 12.1. plan when OFE is set to 12.2.0.1):

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |   666 (100)|     32 |00:00:00.11 |      10 |      0 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |     32 |00:00:00.11 |      10 |      0 | 73728 | 73728 |          |
|   2 |   PX SEND QC (RANDOM)    | :TQ10000 |      0 |     32 |   666   (5)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|*  3 |    HASH JOIN             |          |      2 |     32 |   666   (5)|     32 |00:00:00.05 |   10020 |   7958 |  1250K|  1250K|     2/0/0|
|   4 |     JOIN FILTER CREATE   | :BF0000  |      2 |      4 |    75   (4)|      8 |00:00:00.01 |    1998 |      0 |       |       |          |
|*  5 |      TABLE ACCESS FULL   | T1       |      2 |      4 |    75   (4)|      8 |00:00:00.01 |    1998 |      0 |       |       |          |
|   6 |     VIEW                 |          |      2 |    400K|   583   (4)|     32 |00:00:00.04 |    8022 |   7958 |       |       |          |
|   7 |      UNION-ALL           |          |      1 |        |            |     12 |00:00:00.02 |    4011 |   3979 |       |       |          |
|   8 |       JOIN FILTER USE    | :BF0000  |      2 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
|   9 |        PX BLOCK ITERATOR |          |      2 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
|* 10 |         TABLE ACCESS FULL| T2       |     32 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
|  11 |       JOIN FILTER USE    | :BF0000  |      2 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
|  12 |        PX BLOCK ITERATOR |          |      2 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
|* 13 |         TABLE ACCESS FULL| T3       |     32 |    200K|   292   (4)|     16 |00:00:00.02 |    4011 |   3979 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   5 - filter("T1"."OBJECT_TYPE"='SCHEDULE')
  10 - access(:Z>=:Z AND :Z<=:Z)      -- > edit to avoid wordpress format issue
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."OBJECT_ID")) 
  13 - access(:Z>=:Z AND :Z<=:Z)      -- > edit to avoid wordpress format issue
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T3"."OBJECT_ID"))

As you can see, we create a Bloom filter at operation 4, and use it twice at operations 8 and 11 – with the sys_op_bloom_filter() functions clearly visible in the predicate information showing us that the Bloom filter is applied to the object_id column in both cases.

If you want to disable this enhancement for some reasons there are two hidden parameters available (which you might set for a single query using the opt_param() hint):

  • _bloom_filter_setops_enabled = true
  • _bloom_pruning_setops_enabled = true

The first is for Bloom filters in the situation shown, I assume the second deals with Bloom filters for partition pruning.

Summary

In versions prior to 18.1 the optimizer is unable to push Bloom filters down to the individual tables in a UNION ALL view, but this limitation was removed in the 18.1 code set.

 

February 18, 2020

Interval Partition(s)

Filed under: Oracle,Parallel Execution,Partitioning — Jonathan Lewis @ 1:45 pm GMT Feb 18,2020

A quirky little feature of interval partitioning showed up on Twitter today – a parallel insert that would only use a single PX slave to do the inserting. With 1.1 billion rows and the option for running parallel 32 this made the loading process rather slower than it ought to have been.

Fortunately it’s quite easy to model (and work around) the oddity. So here’s a small data set and an empty partitioned table to work with:


rem
rem     Script:         pt_int_load_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem

create table t1 
nologging 
as
select 
        ao.* 
from 
        all_Objects ao, 
        (select rownum id from dual connect by level <= 20)
;

create table pt1
partition  by range (object_id) interval (1000000) (
        partition p1 values less than (1)
)
as
select  * 
from    all_Objects
where   rownum = 0
/

I’ve created a table by copying all_objects 20 times which, for my little sandbox, has given me a total of about 1.2M rows. Then I’ve created an empty interval-partitioned clone of all_objects, with the first partition defined to hold all rows where the object_id is less than 1 (and there’s no object in the database that could match that criterion). I’ve defined the interval to be 1,000,000 and since the highest object_id in my database is about 90,000 the first partition that gets added to this table will be able to hold all the data from t1.

So now we try to do a parallel insert from t1 into pt1, and check the execution plan and parallel execution statistics:


set serveroutput off

insert /*+ append enable_parallel_dml parallel(6) */ into pt1 select * from t1;

select * from table(dbms_xplan.display_cursor);

start pq_tqstat

Note how I’ve used the hint /*+ enable_parallel_dml */ (possible a 12c hint back-ported to 11.2.0.4) rather than using an “alter session”, it’s just a little convenience to be able to embed the directive in the SQL. The pq_tqstat script is one I published some time ago to report the contents of the session-specific dynamic performance view v$pq_tqstat immediately after running a parallel statement.

Here’s the plan:


SQL_ID  25hub68pf1z1s, child number 0
-------------------------------------
insert /*+ append enable_parallel_dml parallel(6) */ into pt1 select *
from t1

Plan hash value: 2888707464

-------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                            |          |       |       |   631 (100)|          |        |      |            |
|   1 |  PX COORDINATOR                             |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)                       | :TQ10001 |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HIGH WATER MARK BROKERED)| PT1      |       |       |            |          |  Q1,01 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING          |          |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX RECEIVE                             |          |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX SEND RANDOM LOCAL                  | :TQ10000 |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,00 | P->P | RANDOM LOCA|
|   7 |        PX BLOCK ITERATOR                    |          |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,00 | PCWC |            |
|*  8 |         TABLE ACCESS FULL                   | T1       |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,00 | PCWP |            |
-------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access(:Z>=:Z AND :Z<=:Z)

Note
-----
   - Degree of Parallelism is 6 because of hint

The most important detail of this plan is that the PX slaves do the load as select (operation 3), then send a message to the query coordinator (PX send QC, operation 2) to tell it about the data load. They do not send their data to the QC for the QC to do the load.

So the plan says we will be doing parallel DM, but here’s what v$pq_tqstat tells us:


DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES ROW_SHARE DATA_SHARE      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- --------- ---------- ---------- ---------- -----------
         1          0 Producer               1 P006                215880   34785363     17.47      16.86         16          0           0
                                             1 P007                202561   34436325     16.39      16.69         17          0           0
                                             1 P008                207519   34564496     16.79      16.75         17          0           0
                                             1 P009                208408   34594770     16.86      16.77         17          0           0
                                             1 P00A                198915   33529627     16.10      16.25         16          0           0
                                             1 P00B                202537   34430603     16.39      16.69         16          0           0
                      Consumer               1 P000                     0        144      0.00       0.00         51         47           0
                                             1 P001                     0        144      0.00       0.00         51         47           0
                                             1 P002               1235820  206340464    100.00     100.00         75         47           0
                                             1 P003                     0        144      0.00       0.00         51         47           0
                                             1 P004                     0        144      0.00       0.00       1138       1134           0
                                             1 P005                     0        144      0.00       0.00       1137       1133           0

                    1 Producer               1 P000                     0         24      0.00       5.91         51         42           0
                                             1 P001                     0         24      0.00       5.91         50         41           0
                                             1 P002                     2        286    100.00      70.44         58         14           0
                                             1 P003                     0         24      0.00       5.91         51         43           0
                                             1 P004                     0         24      0.00       5.91         51         42           0
                                             1 P005                     0         24      0.00       5.91         51         43           0
                      Consumer               1 QC                       2        406    100.00     100.00        311        179           0

19 rows selected.

The query did run parallel 6 as hinted – and 6 PX slaves scanned the t1 table; but they all sent all their data to one PX slave in the second slave set and that one PX slave did all the inserts. The plan was parallel, but the execution was effectively serial. (You’ll note there is something a little odd about the waits and timeout for p004 and p005 when they are acting as consumers. I may worry about that later, but it could be a host-based side effect of running parallel 6 on a VM with 4 CPUs).

The serialization leads to two questions

  1. What went wrong?
  2. How do we work around this and make the insert “truly” parallel

My answer to (1) is “I don’t know – but I’ll look at it if necessary” combined with the guess – it’s something to do with the table having only one partition at the outset and this has an unexpected side effect on the randomising function for the PX distribution.

My answer to (2) is “if I’m right about (1), why not try pre-defining two partitions, and I’ll even let both of them stay empty”.

So here’s my new definition for pt1:


create table pt1
partition  by range (object_id) interval (1000000) (
        partition p0 values less than (0),
        partition p1 values less than (1)
)
as
select  * 
from    all_Objects
where   rownum = 0
/

Re-running the test with the completely redundant, and permanently empty p0 partition the plan doesn’t change but the results from v$pq_tqstat change dramatically:


DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES ROW_SHARE DATA_SHARE      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- --------- ---------- ---------- ---------- -----------
         1          0 Producer               1 P006                207897   34581153     16.82      16.76         23          4           0
                                             1 P007                215669   34786429     17.45      16.86         30          5           0
                                             1 P008                221474   36749626     17.92      17.81         28          5           0
                                             1 P009                204959   34497164     16.58      16.72         22          2           0
                                             1 P00A                177755   30141002     14.38      14.61         21          0           0
                                             1 P00B                208066   35585810     16.84      17.25         25          2           0
                      Consumer               1 P000                213129   35612973     17.25      17.26         82         57           0
                                             1 P001                200516   33570586     16.23      16.27         84         55           0
                                             1 P002                203395   33950449     16.46      16.45         83         56           0
                                             1 P003                205458   34235575     16.63      16.59         82         54           0
                                             1 P004                204111   33999932     16.52      16.48        581        555           0
                                             1 P005                209211   34971669     16.93      16.95        580        553           0

                    1 Producer               1 P000                     2        286     16.67      16.67        422        149           0
                                             1 P001                     2        286     16.67      16.67        398        130           0
                                             1 P002                     2        286     16.67      16.67        405        128           0
                                             1 P003                     2        286     16.67      16.67        437        161           0
                                             1 P004                     2        286     16.67      16.67        406        116           0
                                             1 P005                     2        286     16.67      16.67        440        148           0
                      Consumer               1 QC                      12       1716    100.00     100.00        242        111           0



19 rows selected.

Every consumer receives and inserts roughly 200,000 rows – it’s a totally fair parallel DML. Timings are pretty irrelevant for such a small data set but the excution time did drop from 7 seconds to 4 seconds when parallelism was working “properly”.

I’ve tested this script on 12.2.0.1 and 19.3.0.0 – the same anomaly appears in both versions though it might be worth noting that the strange skew in the waits and timeouts doesn’t appear in 19.3.0.0.

January 10, 2020

Collection limitation

Filed under: Execution plans,Hints,Oracle,Troubleshooting — Jonathan Lewis @ 5:43 pm GMT Jan 10,2020

The ODC SQL and PL/SQL forum came up with an example a couple of days ago that highlighted an annoying limitation in the optimizer’s handling of table functions. The requirement was for a piece of SQL that would generate “installments” information from a table of contract agreements and insert into another table any installments that were not yet recorded there.

The mechanism to turn a single row of contract data into a set of installments was a (optionally pipelined) table function that involved some business logic that (presumably) dealt with the timing and size of the installments. The final SQL to create the data that needed to be inserted was reported as follows (though it had clearly been somewhat modified):

select 
        y.*,
        trunc(sysdate,'mm'),
        user
from 
        table_a a
join 
        table(
                my_function(
                        a.loan_acct_nbr, 
                        a.start_dt,
                        a.maturity_dt,
                        a.num_of_terms
                )
        ) y
on 
        a.loan_acct_nbr = y.loan_acct_nbr
where 
        a.yr_mnth_cd = to_char(add_months(sysdate,-1),'yyyymm')       -- last month
and     a.loan_typ   = 'ABC'
and     not exists ( 
                select  1 
                from
                        final_load_table l
                where
                        l.loan_acct_nbr = y.loan_acct_nbr
                and     l.yr_mnth_cd    = y.yr_mnth_cd 
        )
;

In this query, table_a is the set of contracts, final_load_table is the set of installments, and my_function() is the pipelined function returning a table of installments derived from the start date, maturity date, and number of installments for a contract. The query needs a “not exists” subquery to eliminate any installments that are already known to the database. Once this query is operating efficiently it could be used either to drive a PL/SQL loop or (generally the better strategy) to do a simple “insert as select”.

We were told that the function would return around 60 rows per contract; that the starting size of the final_load_table would be about 60M rows, and the size of the result set from the initial join would be about 60M or 70M rows (which suggests about 1M rows acquired from table_a).

The owner of this problem seemed to be saying that the query had returned no data after 4 or 5 hours – which suggests (a) the optimizer chose a bad plan and/or (b) the PL/SQL function is working very inefficiently and/or (c) the join had generated a vast amount of data but the effect of the subquery was to discard all of it .

Clearly we needed to see an execution plan (preferably from SQL Monitor) and be given have some idea of how much of the 60M/70M rows predicted for the join would be discarded because it already existed.

The owner did send us an execution plan – which included a very obvious threat and suggested a couple of optimizer errors – but supplied it as a picture rather than a flat text.

You’ll notice, of course, that the tables and columns have changed from the original statement. More significantly, though, there are serious problems with the numbers – the estimated row count for the basic join is only 5,326 rather than 50 Million which, at first sight, is probably why the optimizer has decided that a filter subquery (operation 1) to do an index-only probe (operation 5) is a good way of handling the subquery. Perhaps if the estimates had been a little more accurate (e.g. through the addition of a couple of column groups or, if necessary, by opt_estimate() or cardinality() hints) the subquery would have been unnested and turned into a hash anti-join.

I suggested a quick test of a suitable cardinality() hint – but ran up a little model to check that I’d got the hint right – and found that I had but it wasn’t going to help. So I decided to write up the model (and a possible solution for the owner of the problem) in this blog note.

Here’s the code to create the set of objects I’m going to work with. The naming follows the naming in the original statement of the problem suggested by the owner:


rem
rem     Script:         table_function_plan.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table table_a (
        loan_acct_nbr   number,
        yr_mnth_cd      varchar2(6),
        start_dt        date,
        maturity_dt     date    ,
        number_of_terms number,
        constraint ta_pk primary key(loan_acct_nbr, yr_mnth_cd)
)
;

execute dbms_random.seed(0)

insert /*+ append */
into    table_a
with generator as (
        select rownum id
        from    dual
        connect by
                level <= 4000 -- > comment to avoid wordpress issue
)
select
        trunc(dbms_random.value(1e9, 2e9)),
        to_char(sysdate-(365-mod(rownum,365)),'yyyymm'),
        sysdate-(365-mod(rownum,365)),
        sysdate+(1500+mod(rownum,365)),
        60
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

create table final_load_table_l(
        loan_acct_nbr   number,
        yr_mnth_cd      varchar2(6),
        v1              varchar2(10),
        padding         varchar2(200),
        constraint lt_pk primary key(loan_acct_nbr, yr_mnth_cd)
)
;

insert /*+ append */ into final_load_table_l
with generator as (
        select rownum id
        from    dual
        connect by
                level <= 4000 -- > comment to avoid wordpress issue
)
select
        trunc(dbms_random.value(1e9, 2e9)),
        to_char(sysdate-(365-mod(rownum,365)),'yyyymm'),
        lpad(rownum,10),
        lpad('x',200,'x')
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;


begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'table_a',
                method_opt  => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'final_load_table_l',
                method_opt  => 'for all columns size 1'
        );
end;
/

create type my_row_type as object (
        loan_acct_nbr   number,
        yr_mnth_cd      varchar2(6),
        v1              varchar2(10),
        padding         varchar2(200)
);
/

create type my_table_type as table of my_row_type;
/

create  or replace function my_function (
        i_loan_acct_nbr         in      number,
        i_yr_mnth_cd            in      varchar2,
        i_start_dt              in      date,
        i_maturity_dt           in      date,
        i_number_of_terms       in      number
)       return  my_table_type pipelined
as
begin
        for i in 1..i_number_of_terms loop
                pipe row (
                        my_row_type(
                                i_loan_acct_nbr,
                                to_char(i_start_dt+32*i,'yyyymm'),
                                i,
                                lpad('x',200,'x')
                        )
                );
        end loop;
        return;
end;
/

I was planning to create some large tables – hence the option to generate 16M rows from my generator CTEs – but I discovered the critical problem almost as soon as I had some data and code in place, so I didn’t need to go large.

I’ve had to create an object type and table type in order to create a pipelined function that returns the table type by piping rows of the object type. The data I’ve created, and the way the function generates data probably doesn’t bear much resemblance to the real system of course, but I don’t think it needs to be very close to make the crucial point.

Here’s the text of the select statement the OP wants to run, with the execution plan I got from my data set after running the query and pulling the plan from memory:

alter session set statistics_level = all;

select 
        /*+ find this 1 */
        y.* 
from 
        table_a a, 
        table(my_function(
                a.loan_acct_nbr,
                a.yr_mnth_cd,
                a.start_dt,
                a.maturity_dt,
                a.number_of_terms
        )) y
where 
        a.yr_mnth_cd = '202001'
and     not exists (
                select  
                        /*+ unnest */
                        null
                from    final_load_table_l l
                where   l.loan_acct_nbr = y.loan_acct_nbr
                and     l.yr_mnth_cd    = y.yr_mnth_cd
        )
;

--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |      1 |        |  5059K(100)|  14580 |00:00:00.15 |   16330 |
|*  1 |  FILTER                             |             |      1 |        |            |  14580 |00:00:00.15 |   16330 |
|   2 |   NESTED LOOPS                      |             |      1 |   6283K| 10421   (8)|  14580 |00:00:00.11 |     335 |
|*  3 |    TABLE ACCESS FULL                | TABLE_A     |      1 |    769 |    10  (10)|    243 |00:00:00.01 |     297 |
|   4 |    COLLECTION ITERATOR PICKLER FETCH| MY_FUNCTION |    243 |   8168 |    14   (8)|  14580 |00:00:00.10 |      38 |
|*  5 |   INDEX UNIQUE SCAN                 | LT_PK       |  14580 |      1 |     1   (0)|      0 |00:00:00.02 |   15995 |
--------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NULL)
   3 - filter("A"."YR_MNTH_CD"='202001')
   5 - access("L"."LOAN_ACCT_NBR"=:B1 AND "L"."YR_MNTH_CD"=:B2)


I’ve put in a hint to tell the optimizer to unnest the subquery – and it didn’t. Oracle does not ignore hints (unless they’re illegal, or out of context, or the optimizer never got to them, or you’ve found a bug) so why did Oracle appear to ignore this hint? There’s a really nice feature in 19.3 execution plans – you can request a hint report for a plan, and here’s the hint report for this query (ignore the bits about “find this” being an error):


Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3 (U - Unused (1), E - Syntax error (2))
---------------------------------------------------------------------------
   0 -  SEL$1
         E -  find
         E -  this

   5 -  SEL$3
         U -  unnest / Invalid correlated predicates


I put in an /*+ unnest */ hint to unnest the subquery, and I’ve been told that the predicates are not valid. The only thing about them that could be invalid is that they come from a pipelined function that has returned an object type. The pipelined function does not behave exactly like a table. But this gives me a clue about forcing the unnest to happen – hide the pipelined function inside a non-mergeable view.


select
        /*+ find this 2 */
        v.*
from    (
        select  /*+ no_merge */
                y.* 
        from 
                table_a a, 
                table(my_function(
                        a.loan_acct_nbr,
                        a.yr_mnth_cd,
                        a.start_dt,
                        a.maturity_dt,
                        a.number_of_terms
                )) y
        where 
                a.yr_mnth_cd = '202001'
        )       v
where   not exists (
                select
                        /*+ unnest */
                        null
                from    final_load_table_l l
                where   l.loan_acct_nbr = v.loan_acct_nbr
                and     l.yr_mnth_cd    = v.yr_mnth_cd
        )
/


------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |             |      1 |        | 10628 (100)|  14580 |00:00:00.12 |     387 |       |       |          |
|*  1 |  HASH JOIN RIGHT ANTI                |             |      1 |   6283K| 10628  (10)|  14580 |00:00:00.12 |     387 |  1878K|  1878K| 2156K (0)|
|   2 |   INDEX FAST FULL SCAN               | LT_PK       |      1 |  10000 |     6  (17)|  10000 |00:00:00.01 |      91 |       |       |          |
|   3 |   VIEW                               |             |      1 |   6283K| 10371   (8)|  14580 |00:00:00.11 |     296 |       |       |          |
|   4 |    NESTED LOOPS                      |             |      1 |   6283K| 10371   (8)|  14580 |00:00:00.10 |     296 |       |       |          |
|*  5 |     TABLE ACCESS FULL                | TABLE_A     |      1 |    769 |    10  (10)|    243 |00:00:00.01 |     296 |       |       |          |
|   6 |     COLLECTION ITERATOR PICKLER FETCH| MY_FUNCTION |    243 |   8168 |    13   (0)|  14580 |00:00:00.10 |       0 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("L"."LOAN_ACCT_NBR"="V"."LOAN_ACCT_NBR" AND "L"."YR_MNTH_CD"="V"."YR_MNTH_CD")
   5 - filter("A"."YR_MNTH_CD"='202001')

I’ve wrapped the basic join between table_a and the table function in an inline view called (unimaginatively) v, and I’ve added a /*+ no_merge */ hint to that inline view. So the main query becomes a select from a non-mergeable view with a “not exists” subquery applied to a couple of view columns. And Oracle thinks that that’s okay – and my unnest hint encouraged the optimizer to use a hash anti-join.

So here’s an option for the owner of the problem – but with a few outstanding questions: will a rewrite of their query in this form do the same, will the performance of the select be good enough, and will the subsequent “insert as select” keep the same driving plan.

There’s one significant detail to worry about: the build table in this hash (anti-)join is going to be constructed from 50M (load_acct_bfr, yr_mnth_cd) pairs – which means the workarea requirement is likely to be about 1.2GB for an optimial (i.e. in-memory) hash join; otherwise the join may spill to disc and do a lot of I/O – probably as a one-pass hash join.

(Did you notice,by the way, that the word storage appeared at operation 3 in the original plan?  That suggests a nice big Exadata box; however, there’s no storage predicate in the Predicate Information section for that operation and you would have thought that lease_type = ‘US_SSP’ would be pushed to storage, so maybe this is a ZFS Pillar backing a less exotic server.)

Conclusion

Some (if not all) types of correlated subqueries behave badly if the correlation predicates involve table functions. But you may be able to work around the issue by hiding part of the query, including the table function, inside a non-mergeable inline view before applying the subquery to the view.

Footnote

When I realised that the table function was causing a problem unnesting I remembered that I had written about a similar problem a few years ago – after searching for a little while I discovered a draft note that I had started in September 2015 but had not quite finished; so I’ll be finishing it off and publishing it some time in the next few days. (Update: now published)

 

 

Next Page »

Website Powered by WordPress.com.