Oracle Scratchpad

February 4, 2020

maxquerylen

Filed under: Infrastructure,Oracle,undo — Jonathan Lewis @ 3:50 pm GMT Feb 4,2020

The view v$undostat is a view holding summary information about undo activity that can be used by the automatic undo mechanism to deal with optimising the undo retention time (hence undo space allocation). The view holds one row for every ten minute interval in the last 4 days (96 hours) and includes two columns called maxquerylen and maxqueryid – which tell you something about the query that was considered to be the longest running query active in the interval.

In this note I want to explain why the contents of these two columns are sometimes (possibly often) completely irrelevant despite there being a few notes on the internet about how you should investigate them to help you decide on a suitable setting for the undo_retention.

The descriptions in the 19c reference manual for these columns is as follows:

  • maxquerylen – Identifies the length of the longest query (in seconds) executed in the instance during the period. You can use this statistic to estimate the proper setting of the UNDO_RETENTION initialization parameter. The length of a query is measured from the cursor open time to the last fetch/execute time of the cursor. Only the length of those cursors that have been fetched/executed during the period are reflected in the view.
  • maxqueryid – SQL identifier of the longest running SQL statement in the period

It would be hard to misunderstand the meaning of the second column – if the first column tells us that Oracle has spotted a “longest query” then the second column gives us the sql_id so we can check v$sql to find out what it was. But what sort of queries are going to show up as the “longest query” in the interval?

Here’s an example from a few hours of a nearly idle instance, querying the begin and end times (formatted to show just day of month + time), with the two critical columns,


select  begin_time, end_time, maxquerylen, maxqueryid 
from    v$undostat 
order by 
    begin_time
;

BEGIN_TIME	END_TIME    MAXQUERYLEN MAXQUERYID
--------------- ----------- ----------- -------------
...
04 10:50:18	04 11:00:18	      0
04 11:00:18	04 11:10:18	    356 f3yfg50ga0r8n
04 11:10:18	04 11:20:18	    883 25u1mbkcr9rnu
04 11:20:18	04 11:30:18	   1486 25u1mbkcr9rnu
04 11:30:18	04 11:40:18	   2090 25u1mbkcr9rnu
04 11:40:18	04 11:50:18	      0
04 11:50:18	04 12:00:18	   3299 25u1mbkcr9rnu
04 12:00:18	04 12:10:18	   3903 25u1mbkcr9rnu
04 12:10:18	04 12:20:18	   4507 25u1mbkcr9rnu
04 12:20:18	04 12:30:18	      0
04 12:30:18	04 12:40:18	      0
04 12:40:18	04 12:50:18	      0
04 12:50:18	04 13:00:18	      0
04 13:00:18	04 13:10:18	      0
04 13:10:18	04 13:20:18	      0
04 13:20:18	04 13:30:18	      0
04 13:30:18	04 13:37:27	   9035 25u1mbkcr9rnu

173 rows selected.

Notice, particularly, that the SQL_ID 25u1mbkcr9rnu disappears from the 11:40 interval, then re-appears at 11:50, then disappears again from 12:20 through 13:20 (lunchtime), then reappears again at 13:30. And when it reappears after an absence the query length has increased in a way that’s consistent with the gap. So it looks as if the query wasn’t running during the gap, but turns out to have been running after the gap ended. (Is this Quantum queries?)

The explanation is in the detail of the definition: “from the cursor open time to the last fetch/execute time”. From an SQL*Plus session I “set pause on” then executed the query “select * from all_objects” and hit return a few times to get a few pages of output. Then, spread over the next couple of hours, I hit return a few more times to get a few more pages of output. Each time I hit return my session does another fetch call, and the code behind v$undostat notices that my query is still active.

I don’t know exactly how Oracle is keeping track of “active” statements because there seem to be some inconsistencies in the reporting (and I’ll comment on those later), but as a first approximation, until you close a cursor (either explicitly or implicitly) some piece of Oracle’s code registers the fact that the query might do further fetches, which means it might need to apply more undo to make current data read-consistent with the cursor’s start SCN, which means that it should take note of the time the cursor has been open because the undo retention time might need to be that long.

Inconsistencies.

I said there were some inconsistencies in the reporting. I’ve noticed three anomalies – perhaps due to the extreme idleness of the instance I was using for testing.

  1. At about 12:45 I hit return a few times to get the maxquerylen and maxqueryid refreshed – but the code didn’t seem to notice that I had done a few more fetches of a long running query. So it seems to be possible for Oracle to miss the query that should be reported.
  2. At about 11:52 I hit return a few times again, and you can see that the two critical columns were refreshed, but if you do the arithmetic Oracle has attributed 600 seconds to the query – the whole of the 10 minute refresh interval, not just the time up to the moment of the last fetch that I had done in that interval.
  3. I didn’t hit return again until about 12:45 (the moment in point 1 above when the two columns didn’t refresh the way I though they should), but the columns kept updating through 12:00 and 12:10 intervals anyway before disappearing from the 12:20 interval. So it looks like queries can be reported as long running even when they haven’t been executing or fetching.

The is one last anomaly – and this relates to the reason I started looking at this columns. If you check the entry for 11:00 you’ll see that the SQL Id ‘f3yfg50ga0r8n’ has been reported as “running” for 356 seconds. But here’s what I found in v$sql for that query:


select  executions, parse_calls, fetches, end_of_fetch_count, elapsed_time, cpu_time, rows_processed, sql_text 
from    v$sql 
where   sql_id = 'f3yfg50ga0r8n'
;

EXECUTIONS PARSE_CALLS	  FETCHES END_OF_FETCH_COUNT ELAPSED_TIME   CPU_TIME ROWS_PROCESSED SQL_TEXT
---------- ----------- ---------- ------------------ ------------ ---------- -------------- ------------------------------------------
        79          79         79                  0        20487      10667             79 select obj# from obj$ where dataobj# = :1


The SQL lookes like a sys-recursive statement which, in 79 executions, has accumulated 20 milliseconds of elapsed time (rather than 356 seconds – but that difference could just be related to one or other of the anomalies I reported earlier). The key thing to note is that the value of column end_of_fetch_count is zero: this looks like a statement where Oracle has simply not bothered to fetch all the data and has then not bothered to close the cursor. As a result it’s possible that each time the statement is executed (note that parse_calls = executions, so we’re not looking at a “held” cursor) the code behind v$undostat looks back at the time the cursor was initially opened to measure the query length, rather than looking at the time the statement was re-executed.

This may go a long way to answering the question that came up on Oracle-l earlier on today as follows:

The following query (sql_id is 89w8y2pgn25yd) was recorded in v$undostat.maxqueryid in a 12.2. database during a period of a high undo usage: select ts# from sys.ts$ where ts$.online$ != 3 and bitand(flags,2048) != 2048;

select
        undoblks,txncount,maxquerylen,maxconcurrency,activeblks
from    v$undostat u
where   maxqueryid='89w8y2pgn25yd'
;

UNDOBLKS  TXNCOUNT  MAXQUERYLEN  MAXCONCURRENCY  ACTIVEBLKS
--------  --------  -----------  --------------  ----------
   39199      4027         1378               5     2531960

What is this query, and how does it manage to report a maximum query length of 1,378 seconds? Just like the one above it’s a sys-recursive query; and this one appears when you query dba_tablespaces – and though it executes once for every row and takes just fractions of a second to execute. But if you trace a query like “select tablespace_name from dba_tablespaces” you’ll see that every time the query is called the trace file will show lines for: “Parse, Exec, Fetch, Close” until the last call – which doesn’t report a “Close”.

Just like my original “select * from all_objects” there’s a query dangling with an open cursor, leaving Oracle with the opportunity to go back to the moment it was opened and report it as a “long running query”.

tl;dr

The maxquerylen and maxqueryid in v$undostat don’t tell you about statements that have taken a long time to change a lot of data and generate a lot of undo; they tell you about statements that might need to apply a lot of undo to see read-consistent data and therefore might become victims in a “ORA-01555: snapshot too old” event.

For various reasons the statements reported may be completely irrelevant because there are various reasons (some, possibly, bug-related) why a cursor may be treated as if it opened a long time in the past when it was actually a re-use of an existing “dangling” cursor. It’s also possible that some bad client code will treat cursor in a way that does no harm to the client program but hides a more interesting query that would otherwise have been reported by these two columns.

 

 

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.