Oracle Scratchpad

April 22, 2020

Eureka!

Filed under: Execution plans,Oracle,Troubleshooting — Jonathan Lewis @ 1:08 pm BST Apr 22,2020

I woke up last night with a brilliant solution to a problem that’s been bugging me for more than a year. How does a call to report_sql_monitor() manage to produce output like this:

SQL Plan Monitoring Details (Plan Hash Value=4262489872)
======================================================================================================================================================
| Id |            Operation            | Name  |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read  | Read  | Activity | Activity Detail |
|    |                                 |       | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs  | Bytes |   (%)    |   (# samples)   |
======================================================================================================================================================
|  0 | SELECT STATEMENT                |       |         |      |        13 |     +2 |     1 |        1 |       |       |          |                 |
|  1 |   SORT AGGREGATE                |       |       1 |      |        13 |     +2 |     1 |        1 |       |       |          |                 |
|  2 |    NESTED LOOPS                 |       |      5M |  14M |        13 |     +2 |     1 |       4M |       |       |          |                 |
|  3 |     NESTED LOOPS                |       |      5M |  14M |        13 |     +2 |     1 |       4M |       |       |          |                 |
|  4 |      TABLE ACCESS FULL          | T1    |      5M | 4686 |        13 |     +2 |     1 |       4M |  1050 | 260MB |          |                 |
|  5 |      INDEX RANGE SCAN           | T2_I1 |       1 |    2 |        14 |     +1 |    5M |       4M |  5494 |  83MB |          |                 |
|  6 |     TABLE ACCESS BY INDEX ROWID | T2    |       1 |    3 |        13 |     +2 |    5M |       4M | 17743 | 260MB |          |                 |
======================================================================================================================================================


If you’re wondering why this is a problematic output take a look at the A-Rows column for operation 4 (and 5), and the Execs colukmn for operations 5 (and 6). Given the nature of a nested loop join the number of Execs of operation 5 should match the number of rows (A-rows) generated by operation 4, and the number of Execs of operation 6 should match the number of rows generated by operation 5.

But Oracle claims to have generated 4 million rows then executed 5 million times in both cases. How does that happen?

The very simple thought I had last night was “formatting” – if the actual value is 4,500,000 maybe the A-rows rounds down and the Execs rounds up, so I set up a little test to check the hypothesis. Since the version where I’d first seen the anomaly was 12.1.0.2 that’s the version I tested first.

rem
rem     Script:         report_sql_monitor_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem 

drop table t2 purge;
drop table t1 purge;

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        rownum                          n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',25,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 4500100 -- > comment to avoid WordPress format issue
/

create index t1_i1 on t1(id);

create table t2 as select * from t1;
create index t2_i1 on t2(id);

alter system flush shared_pool;
alter system flush buffer_cache;

select  /*+
                monitor
                qb_name(main)
                leading(@inline t1@inline t2@inline)
                use_nl_with_index(@inline t2@inline)
        */
        count(n2)
from    (
        select
                /*+
                        qb_name(inline)
                */
                t1.n1, t2.n1 n2
        from    t1, t2
        where   t1.id <= 4499999
--      where   t1.id <= 4500000
--      where   t1.id <= 4500001
        and     t2.id =  t1.n1
        )
;

prompt  =======================================
prompt  Now run report_sql_monitor.sql from SYS
prompt  =======================================

You’ll note that I’ve allowed three possible values for the number of rows I get from t1 to drive the index probe into t2 – I thought I’d have to do several experiments to make the oddity appear. In fact it appeared on the very first attempt: A-Rows reported 4M, and Execs report a clearly incorrect 5M. (Repeating the test on 12.2.0.1 the anomaly disappeared – all the output reported 4M for 4,499,999; and 5M for 4,500,000.)

You’ll notice the “flush buffer_cache” command in this script, this came about because of some secondary testing. Because I wanted to check a few variations I commented out the “drop table” commands. The consequence of not dropping the tables was that the anomaly disappeared – in some way it was connected to reading blocks from disc and didn’t appear once the table and index were cached. In fact when I set the driving count to 4,998,000 in one of my tests I got a result that looked slightly more counter-intuitive than the original:

=====================================================================================================================================================
| Id |            Operation            | Name  |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity | Activity Detail |
|    |                                 |       | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |   (# samples)   |
=====================================================================================================================================================
|  0 | SELECT STATEMENT                |       |         |      |        11 |     +2 |     1 |        1 |      |       |          |                 |
|  1 |   SORT AGGREGATE                |       |       1 |      |        11 |     +2 |     1 |        1 |      |       |          |                 |
|  2 |    NESTED LOOPS                 |       |      4M |  14M |        11 |     +2 |     1 |       4M |      |       |          |                 |
|  3 |     NESTED LOOPS                |       |      4M |  14M |        11 |     +2 |     1 |       4M |      |       |          |                 |
|  4 |      TABLE ACCESS FULL          | T1    |      4M | 4686 |        11 |     +2 |     1 |       4M | 1050 | 260MB |          |                 |
|  5 |      INDEX RANGE SCAN           | T2_I1 |       1 |    2 |        12 |     +1 |    4M |       4M |  345 |  83MB |          |                 |
|  6 |     TABLE ACCESS BY INDEX ROWID | T2    |       1 |    3 |        11 |     +2 |    5M |       4M | 1048 | 262MB |          |                 |
=====================================================================================================================================================


Notice how I’ve (“correctly”) acquried 4M index rowids at operation 5, but (apparently) managed to access the table 5M times!

As I noted above, the oddity seems to have disappeared in 12.2.0.1 so it’s not worth pursuing further. I think it might relate either to Oracle double counting the Exec when its first access is a phyiscal read, or maybe it’s managing to increment the wrong counter when it visits a space management block to find the next batch of blocks to read. The detail is really not very important.

What might be important, though, is that sometimes the number of Execs you see is truthfully greater than ought to be possible because of the strange effects that can appear with read-consistency. For a very long time I believed that the oddity I’ve show here was something to do with read-consistency and query restart, but I couldn’t come up with a hypothesis to explain how that could be happening. So I was very relieved last night to come up with a simple explanation (which actually turned out to be wrong, but did at least help me recognise how the anomaly could happen in perfectly harmless circumstances).

Footnote:

What’s the difference between 4 million and 5 million – sometimes it’s one (just one, not one million). When testing under 12.2.0.1 I only had to add one row to the driving rowsource (changing 4,499,999 to 4,500,000) to see the A-rows and Execs change from 4M to 5M. If you forget the implicit rounding errors in reports like this you can end up chasing a tiny anomaly because it looks like a huge anomaly.

 

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.