Oracle Scratchpad

June 13, 2023

Case Study

Filed under: Oracle,Performance,Problem Solving,Troubleshooting,Tuning — Jonathan Lewis @ 12:50 pm BST Jun 13,2023

Here’s a problem that appeared on the Oracle-L list server a little while ago. It features an Exadata system with a “very large” GTT (global temporary table) and a pair of queries that are repeated frequently, using a different literal value on each repetition of the pair.

The queries, with sample execution plans, have been reported using the SQL Monitor feature and are available on github, but I’ll repeat them here, individually, for convenience. The first query is a very simple single-table with aggregation:

SELECT FG.FID,   FG.DCD, FG.MDT,1.00 , SUM (FG.N_CD), SUM (FG.d_nsd), and several more summed columns
    FROM F_GTT FG
   WHERE FG.FID = {literal constant}
GROUP BY FG.FID, FG.DCD, FG.MDT
ORDER BY 1, 3


Global Information
------------------------------
 Status              :  DONE (ALL ROWS)           
 Instance ID         :  3                         
 SQL ID              :  gmm2v6dtvu0yb             
 SQL Execution ID    :  50331648                  
 Execution Started   :  06/01/2023 14:52:21       
 First Refresh Time  :  06/01/2023 14:52:25       
 Last Refresh Time   :  06/01/2023 14:53:11       
 Duration            :  50s                       
 Fetch Calls         :  1                         

Global Stats
==================================================================================================================================================
| Elapsed |   Cpu   |    IO    | Application | Cluster  |  Other   | Fetch | Buffer | Read  | Read  | Uncompressed |  Offload   |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Waits(s) | Calls |  Gets  | Reqs  | Bytes |    Bytes     | Elig Bytes | Returned Bytes |
==================================================================================================================================================
|      50 |      48 |     0.68 |        0.01 |     0.00 |     0.72 |     1 |     3M | 45632 |  22GB |         22GB |       22GB |           22GB |
==================================================================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=3209725901)
==============================================================================================================================================================================
| Id |          Operation           |      Name      |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read  | Read  |  Mem  | Activity |      Activity Detail      |
|    |                              |                | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs  | Bytes | (Max) |   (%)    |        (# samples)        |
==============================================================================================================================================================================
|  0 | SELECT STATEMENT             |                |         |      |        47 |     +4 |     1 |       31 |       |       |     . |          |                           |
|  1 |   SORT GROUP BY              |                |     217 | 792K |        47 |     +4 |     1 |       31 |       |       | 14336 |          |                           |
|  2 |    TABLE ACCESS STORAGE FULL | F_GTT          |   63836 | 792K |        50 |     +1 |     1 |    37491 | 45632 |  22GB |  14MB |    97.96 | Cpu (46)                  |
|    |                              |                |         |      |           |        |       |          |       |       |       |          | cell smart table scan (2) |
==============================================================================================================================================================================

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - storage("FG"."FID"={literal value})
       filter("FG"."FID"={literal value})
	   

It’s not a difficult query, and it’s easy to see that almost all the time is spent in a tablescan that reads 22GB to identify 37,491 rows, with fractions of a second spent aggregating those rows. And of the time spent in the tablescan almost all of it is CPU time (rather than waits for I/O).

Details to note:

  • We’ve done 45.000 read requests for 22GB of data read, which suggests we’re doing reads of 500KB (rather than the 1MB that you might have expected).
  • We’ve done 3M buffer gets – which is a good match for 22GB of data read with an 8KB block size, so it looks like we’re doing a simple tablescan into buffer cache. (This is a little unexpected since we spend time on “cell smart tablescan” so it would be nice to see the session statistics for one execution so that we could see the underlying reason for these buffer gets).
  • We don’t seem to have saved any data traffic through storage indexes (but that might be because Oracle doesn’t create storage indexes for temporary tablespaces – that’s a detail I’d have to check).

It would be nice to know, of course, how many rows there were in the table, and how many distinct values of FID, and how variable the number of rows per FID was across the table. Based on the little we know, though, and assuming that this single execution represents a fairly typical value of FID, the basic question seems to be: would we benefit from defining an appropriate index on the table, though there are several other ways that you could choose to improve the performance if you had a completely free hand and could make whatever simple changes you liked in the surrounding code.

We can note, of course, that with a suitable index we might have done about 37,500 (single block) read requests compared to 45,000 multi-block read requests, and that looks like a small win on I/O but with a potentially big win on CPU. The obvious index is on (FID), defined with “compress 1” but if we created the index on (FID, MDT, DCD) we might get some extra benefit since Oracle could walk the index using a “SORT GROUP BY NOSORT” operation to aggregate and return the data in the right order.

Note: the SQL says: select fid, dcd, mdt … group by fid, dcd, mdt order by 1,3 – but Oracle has assumed that the order by clause can be ignored because of the group by (the optimizer’s allowed to do that, programmers aren’t). This means the optimizer has recognized that it could the query to group by fid, dcd, mdt order by 1,2 to produce the same result.

Since the initial 37,491 rows for the FID collapsed to 31 rows after aggregation might find that we could even create the larger index with “compress 3”.

Here’s another thought that goes with the idea of creating the perfect index for walking the data and avoiding sorting for every execution of the query: this table is re-populated before the queries are executed so could we populate the table with pre-ordered data so that all the rows for any individual FID are packed into the smallest number of blocks?

Of course, changes like adding an index and pre-sorting data might have significant side effects in other parts of the code. What we’re trying to do here is find big wins using simple strategies that avoid complicated code changes – we need to be sure that we haven’t introduced side-effects that mean we’ve moved the problem rather than solving it.

We should keep our thoughts about the first query in mind as we move on to the second query. This one is a very simple join and aggregation (via distinct) which features the same very large tablescan:

SELECT  DISTINCT ft.fT_NM, FGTT.DCD, ft.COl5, ft.FT_CID, RGTT.mcg_ind,   C.D_CD
    FROM rt_gtt         RGTT,
         F_GTT          FGTT,
         RD_FTIN ft,
         RDC           c
   WHERE     FGTT.eid = RGTT.nid
         AND FGTT.etyp = RGTT.nety
         AND FGTT.FID = ft.ftid
         AND FGTT.FID = {literal value}
         AND FGTT.DCD = c.DCD
         AND ft.FT_CID = RGTT.mcid
ORDER BY ft.fT_NM, DCD

	   
	   Global Information
------------------------------
 Status              :  DONE (ALL ROWS)           
 Instance ID         :  3                         
 SQL ID              :  c3y3n73an8b69             
 SQL Execution ID    :  50331648                  
 Execution Started   :  06/01/2023 14:54:39       
 First Refresh Time  :  06/01/2023 14:54:43       
 Last Refresh Time   :  06/01/2023 14:55:17       
 Duration            :  38s                       
 Fetch Calls         :  1                         

Global Stats
==================================================================================================================================================
| Elapsed |   Cpu   |    IO    | Application | Cluster  |  Other   | Fetch | Buffer | Read  | Read  | Uncompressed |  Offload   |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Waits(s) | Calls |  Gets  | Reqs  | Bytes |    Bytes     | Elig Bytes | Returned Bytes |
==================================================================================================================================================
|      39 |      37 |     0.57 |        0.01 |     0.00 |     0.59 |     1 |     4M | 45640 |  22GB |         22GB |       22GB |           22GB |
==================================================================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=891026795)
========================================================================================================================================================================================
| Id |             Operation              |             Name             |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read  | Read  |  Mem  | Activity | Activity Detail |
|    |                                    |                              | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs  | Bytes | (Max) |   (%)    |   (# samples)   |
========================================================================================================================================================================================
|  0 | SELECT STATEMENT                   |                              |         |      |         1 |    +38 |     1 |        1 |       |       |     . |          |                 |
|  1 |   SORT UNIQUE                      |                              |   19196 |   6M |         1 |    +38 |     1 |        1 |       |       |  2048 |          |                 |
|  2 |    NESTED LOOPS                    |                              |         |      |         1 |    +38 |     1 |      380 |       |       |     . |          |                 |
|  3 |     NESTED LOOPS                   |                              |   19196 |   6M |         1 |    +38 |     1 |      380 |       |       |     . |          |                 |
|  4 |      NESTED LOOPS                  |                              |      94 |   6M |         1 |    +38 |     1 |        2 |       |       |     . |          |                 |
|  5 |       NESTED LOOPS                 |                              |      1M |   4M |         1 |    +38 |     1 |        2 |       |       |     . |          |                 |
|  6 |        VIEW                        | VW_DTP_E7E675E0              |      3M | 815K |         1 |    +38 |     1 |        2 |       |       |     . |          |                 |
|  7 |         HASH UNIQUE                |                              |      3M | 815K |        35 |     +4 |     1 |        2 |       |       |  11MB |          |                 |
|  8 |          TABLE ACCESS STORAGE FULL | F_GTT                        |      3M | 792K |        39 |     +0 |     1 |    27239 | 45638 |  22GB |  14MB |   100.00 | Cpu (38)        |
|  9 |        TABLE ACCESS BY INDEX ROWID | rt_gtt                       |       1 |    2 |         1 |    +38 |     2 |        2 |       |       |     . |          |                 |
| 10 |         INDEX RANGE SCAN           | rt_gtt_IX5                   |       1 |    1 |         1 |    +38 |     2 |        2 |       |       |     . |          |                 |
| 11 |       TABLE ACCESS BY INDEX ROWID  | RD_FTIN                      |       1 |    3 |         1 |    +38 |     2 |        2 |       |       |     . |          |                 |
| 12 |        INDEX RANGE SCAN            | RD_FTIN_IX1                  |       1 |    2 |         1 |    +38 |     2 |        2 |       |       |     . |          |                 |
| 13 |      INDEX RANGE SCAN              | RDC_IX1                      |     206 |    1 |         1 |    +38 |     2 |      380 |     2 | 16384 |     . |          |                 |
| 14 |     TABLE ACCESS BY INDEX ROWID    | RDC                          |     204 |  169 |         1 |    +38 |   380 |      380 |       |       |     . |          |                 |
========================================================================================================================================================================================

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - storage("FGTT"."FID"={literal value})
       filter("FGTT"."FID"={literal value})
   9 - filter("ITEM_3"="RGTT"."nety")
  10 - access("ITEM_4"="RGTT"."nid")
  11 - filter("FT"."FT_CID"="RGTT"."mcid")
  12 - access("ITEM_2"="FT"."ftid")
  13 - access("ITEM_1"="C"."DCD")

Again we have a query that selects all the rows for a given FID from the global temporary table, but this time it joins them to three other tables before eliminating duplicates. Unsurprisingly a very large fraction of the time is spent on the tablescan with the same 45,000 read requests pushing 22GB across the interconnect to the buffer cache, with most of the time recorded as CPU time not I/O wait time.

There’s a very interesting feature of this plan, though. The SQL statement says “join then make distinct”, but the plan reports “make distinct then join” – applying a hash unique operation to the GTT data reducing 27,239 rows to 2 rows before joining the other three tables. (I’ve written about the “group by placement” transformation in the past, this is the equivalent transformation for select distinct.)

Note: it’s a little odd that two queries running just one and a half minutes apart from each other should have estimates which are out by a factor of two in one case and 100 in the other for a simple “column = constant” predicate. This might be something to do with “out of range” values, or a side effect of a hybrid histogram with an unlucky sample size.

The only thing that really matters in this query is the tablescan of the GTT, and obviously we could again consider creating an index on (FID) as a possible way to reduce the workload, giving us a single extra index that might benefit both queries. In this case, though, there are (only) three other column from the GTT appearing in the query, so we could create an index that included those column so that Oracle didn’t have to visit the table at all. For example (FID, eid, etyp, dcd) – with some level of compression that we’d have to decide on by reviewing the data.

Of course that does leave us wanting two new indexes on the same large table that have to be maintained as the data is loaded (remember that you can’t create indexes on GTT’s after the data has been inserted). So we ought to look for alternative strategies.

Again the reduction from tens of thousands of rows to “just a few” rows prompts an idea. I’m guessing that Oracle aggregates (distinct) the GTT table down with the equivalent of an inline view: “select distinct fid, eid, etyp, dcd from … “ (the 10053 (CBO) trace might show this as one of the “unparsed” queries). If that’s the case, why not have a summary GTT that emulates a “refresh complete materialized view” on the main GTT and populate it with the result of that select distinct immediately after populating the main GTT; then reference the summary GTT in the join query instead of the main GTT. With a reduction by a factor of around 10,000 in the number of rows and a further reduction due to the column restriction, the tablescan might drop from 22GB to just a couple of (cached) megabytes.

With the “materialized view” concept in mind, we could revisit the first query. The big problem with the current coding strategy is that we populate a large table once, then keep executing a pair of queries that are interested in one FID at a time. Maybe we could put a wrapper around the whole thing to populate two pre-calculated, small, summary tables, and change then tweak the two problem queries to reference their corresponding “materialized views”.

Summary

The system populates a very large GTT, then seems to be running a loop to execute two queries per “FID” value. Both queries spend most of their time doing a tablescan of the large GTT before aggregating a few thousand rows down to a handful of rows.

A basic strategy for improving performance would be to create two new indexes on this table (though one might be “good enough” for both queries in the short term); the indexing strategy could be made more effective if the code to load the GTT could be modified to pre-sort the data before loading it.

An alternative strategy that takes advantage of the massive drop in volume as the data is aggregated is pre-load two (much smaller) “summary” GTTs after loading the main GTT and before starting the loop through the FIDs. This would require the two queries to be edited to reference their respective summary tables. (The summary tables could, of course, be indexed to add a little extra benefit.)

3 Comments »

  1. Hi Jonathan, reading throught the stats – something came up as odd to me. Bytes read is 22 GB and bytes returned by offload is also 22 GB. If my memory is correct, this means zero cell offload occured. Which can mean a few things:
    – All records have the same FID in the temp table
    – All columns of the temp table are selected

    Or it could be hitting some limitation about offload, for example having an active transaction on the records, requiring all blocks to be sent “as is” and then undo / commit /scn being checked for each one – which is where the CPU time would’ve been inflated. For 22 GB and just a sum/group by – 48 CPU seconds seems rather high.

    Given this is a global temporary table there’s a high chance it’s a “delete on commit” and records are inserted and read without commit.

    Just a wild guess. As you said – we need session stats to see what’s going on.

    Comment by Christo Kutrovsky — June 19, 2023 @ 6:00 pm BST Jun 19,2023 | Reply

    • Christo,
      Thanks for the comment, and sorry about the delay responding.
      I like the “delete on commit” idea.

      We know that there are only a few rows per FID and the same query is used for many different FIDs, so we might expect a storage index to be created on the first query and used to eliminate a huge amount of data thereafter. But this isn’t happening – which may be because Oracle doesn’t create them on GTTs, but may be because every block is showing an uncommitted transaction, and that might also explain the buffer gets being used rather than direct path/smart scan memory.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — July 20, 2023 @ 7:15 pm BST Jul 20,2023 | Reply

  2. […] Case Study (June 2023): querying a very large GTT – indexes or “materialized views” […]

    Pingback by Troubleshooting catalogue | Oracle Scratchpad — June 26, 2023 @ 11:49 am BST Jun 26,2023 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.