Oracle Scratchpad

March 21, 2022

X$ row sizes

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 10:59 am GMT Mar 21,2022

Here’s a script I’ve used from time to time to help me get some ideas about how to hack through some x$ tables. It’s obviously something that can only be run by the SYS schema and it’s not something to apply arbitrarily to every x$ table as the side effects could be nasty but, on a sandbox instance with enough CPUs and a large enough SGA, it might give you some interesting clues about how Oracle is defining or accessing some of its internal structures.

The first query simply walks through the specified x$ table in index (indx) order comparing the address (addr) of the preceding row with that of the current row and reporting the difference. Some of the x$ tables are likely to give you a random collection of large numbers, which means there is no well-structured row involved. Others will give very regular patterns – though not necessarily a constant value as you move from row to row (take a look at x$ksuru, for example).

The second query takes the lowest and highest addresses of rows in the table, finds the difference and divides by one less than the number of rows in the table (i.e. allowing for the fact that the highest address is the start of a final row, not the end of a row). If the average size value matches the value reported by every other row in the output that’s a fairly strong clue that the x$ table may be a simple array of fixed sized rows.

rem
rem     Script:         x_rowsize.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2002 (probably)
rem
rem     Usage:
rem             Connect as SYS (due to x$)
rem             start x_rowsize {x tablename} {number of rows to list}
rem

set linesize 120
set pagesize  60
set trimspool on
set tab off

prompt  ================================================
prompt  Usage:  @x_rowsize {table_name} {number_of_rows}
prompt  ================================================
pause   Press return to continue

define m_xtab  = &1
define m_rowct = &2

column addr_dec         format 999,999,999,999,999
column lag_addr_dec     format 999,999,999,999,999
column row_size         format 999,999,999,999,999

spool x_rowsize

select 
        addr,
        to_number(addr,'XXXXXXXXXXXXXXXX') addr_dec, 
        to_number(lag(addr,1) over (order by indx),'xxxxxxxxxxxxxxxx') lag_addr_dec, 
        to_char(
                to_number(addr,'XXXXXXXXXXXXXXXX') -
                        to_number(lag(addr,1) over (order by indx),'xxxxxxxxxxxxxxxx'),
                        'xxxxxxxxxxxxxxxx'
        ) row_size_hex,
        to_number(addr,'XXXXXXXXXXXXXXXX') -
                to_number(lag(addr,1) over (order by indx),'xxxxxxxxxxxxxxxx') row_size,
        indx 
from    &m_xtab
where   indx <= &m_rowct
order by 
        indx
/

prompt  ========================================================
prompt  Average row size will (almost certainly) not match the
prompt  row gap listed in the above if the x$ table is segmented
prompt  ========================================================

select
        min(addr), max(addr), count(*),
        (to_number(max(addr),'xxxxxxxxxxxxxxxx') -
                to_number(min(addr),'xxxxxxxxxxxxxxxx')
        ) / (count(*) - 1) avg_row_size
from    &m_xtab
/

spool off

It’s a good idea to start with a small sample size in case a complete scan of a structure crashes the session or even the instance- so here’s the result of calling the script with the parameters x$ksuse 10

===============================================
Usage:  @x_rowsize {table_name} {numbe_of_rows}
===============================================
Press return to continue


ADDR                         ADDR_DEC         LAG_ADDR_DEC ROW_SIZE_HEX                  ROW_SIZE       INDX
---------------- -------------------- -------------------- ----------------- -------------------- ----------
000000008E713678        2,389,784,184                                                                      1
000000008E715E60        2,389,794,400        2,389,784,184              27e8               10,216          2
000000008E718648        2,389,804,616        2,389,794,400              27e8               10,216          3
000000008E71AE30        2,389,814,832        2,389,804,616              27e8               10,216          4
000000008E71D618        2,389,825,048        2,389,814,832              27e8               10,216          5
000000008E71FE00        2,389,835,264        2,389,825,048              27e8               10,216          6
000000008E7225E8        2,389,845,480        2,389,835,264              27e8               10,216          7
000000008E724DD0        2,389,855,696        2,389,845,480              27e8               10,216          8
000000008E7275B8        2,389,865,912        2,389,855,696              27e8               10,216          9
000000008E729DA0        2,389,876,128        2,389,865,912              27e8               10,216         10

10 rows selected.

========================================================
Average row size will (almost certainly) not match the
row gap listed in the above if the x$ table is segmented
========================================================

MIN(ADDR)        MAX(ADDR)          COUNT(*) AVG_ROW_SIZE
---------------- ---------------- ---------- ------------
000000008E713678 000000008EDEDD70        704   10222.1451

1 row selected.

At first sight it looks as if the rowsize for an x$ksuse (v$session) rows is 10,216 bytes – but then the average rowsize turns out to be different, so there’s some inconsistency somewhere in x$ksuse that we can’t see in the first few rows.

After running the report for the whole x$ksuse structure I realised that a quick and easy edit would report just the interesting bits. I folded the original query into an inline view and selected only the rows where “row_size != 10216”. Here’s the result:

ADDR                         ADDR_DEC         LAG_ADDR_DEC ROW_SIZE_HEX                  ROW_SIZE       INDX
---------------- -------------------- -------------------- ----------------- -------------------- ----------
000000008E8CAB98        2,391,583,640        2,391,571,984              2d88               11,656        177
000000008EA820B8        2,393,383,096        2,393,371,440              2d88               11,656        353
000000008EC395D8        2,395,182,552        2,395,170,896              2d88               11,656        529

That looks like an interesting pattern – especially if you’re good at mental arithmetic (or have a calculator handy) and I tell you that there are 704 rows in x$ksuse.

The way I’ve written my code the row size reported for indx = n is the length of row n-1, so out of the 704 rows in x$ksuse, the “interesting” rows are 176, 352, and 528 and when you check the arithmetic you realise:

  • 176 = 704 / 4,
  • 352 = 704 / 2,
  • 528 = 704 * 3/4

Maybe it’s not a coincidence that this instance is running on 4 CPUs; moreover, with that thought in mind, if I look at v$latch_children to see if there are any latches with 4 children I find (amongst others) the following latch names:

client/application info
session allocation
session idle bit
session switching

There may be other latches that are also relevant but these 4 latches have names that suggest you might want to acquire one of them if you wanted to modify the contents of a session-related row – in other words if you wanted to modify the contents of a row in x$ksuse. So maybe Oracle has checked the cpu_count on startup and created the v$session/x$ksuse array in equal sized pieces, one for each CPU, so that it can minimise contention when try to modify data in x$ksuse. And following that thought I’ll just add that I had set sessions = 702 in my parameter file, and Oracle had rounded that up to a multiple of 4.

The hypothesis is easy to check – just bounce the database a few times, using a pfile rather than an spfile, change the cpu_count and sessions parameter each time and see if Oracle always rounds sessions up (if needed) to a multiple of cpu_count and then splits the array into cpu_count equal pieces.

And that certainly seemed to be the case on my little machine. (Obviously I’m not in a position to see what would happen on a machine with 256 CPU and sessions set to 16,000 – so there might be some sanity checks for extreme cases.)

Jumping to conclusions

Going back to the output from my initial setup, it looks as if each row in x$ksuse is going to demand 10,216 bytes, and that there may be 1,440 (viz: 11,656 – 10,216) bytes needed as some sort of “header” for each segment of the array, but what sort of header could need 1,440 bytes? I can’t help noticing that if you wanted to store 176 pointers of 8 bytes each that’s a total of 1,408 bytes – so maybe the “header” is mostly a list of pointers to the rows – just like the row directory in a data block!

It’s a thought that might be heading in the right direction, but perhaps it’s a little premature. Fortunately on one of my restarts one of the gaps showed up as roughly minus 30,000,000. So I need to find an explanation that covers this case as well – but that’s okay because I soon came up with a hypothesis (that I haven’t actually confirmed with a reproducible model yet).

I know that my memory granule size is 16MB (select * from V$sgainfo). What if Oracle had allocated the first few sections of x$ksuse and then found that it didn’t have enough room left in the current granule for the next piece of the array? It would have to start using another granule. What if the granules were attached to the shared pool “from the top downwards” – the next granule would have a starting address that was 16MB lower than the current granule – so the step from the end of the current granule to the beginning of the next granule would be around 32MB backwards. Allowing for the fact that I was experimenting with numbers that produced sizes of about 2MB (200 rows @ 10,216 bytes) for each piece of the array the hypothesis is reasonably consistent with the step size of minus 30 million.

Further experimentation is left as an exercise for the interested reader, but if I wanted to test the granule hypothesis I could probably experiment with carefully selected values for sessions, maybe starting the instance with a reduced granule size (parameter _ksmg_granule_size) to see if I could get two or more backwards jumps in the array. But I think I’ve pursued x$ksuse far enough for the moment. Except I have one more (generic) problem that it highlights.

It’s not that simple

If I query x$kqfta – the list of x$ tables, I see the following for x$ksuse:

SQL> execute print_table('select * from x$kqfta where kqftanam = ''X$KSUSE''')
ADDR                           : 00000000143D1DD0
INDX                           : 43
INST_ID                        : 1
CON_ID                         : 0
KQFTAOBJ                       : 4294951004
KQFTAVER                       : 42
KQFTANAM                       : X$KSUSE
KQFTATYP                       : 2
KQFTAFLG                       : 1
KQFTARSZ                       : 8
KQFTACOC                       : 113

-----------------

1 row(s) selected

The column kqftarsz is the “Row SiZe” and 8 is a long way off the 10,216 that I’ve derived from the addresses of consecutive rows in the table. So how does an 8 byte column turn into a 10,216 byte row? (I wonder if it’s describing the rows in the table of pointers to the rows, not the length of the rows themselves). But (apparently) it’s neither 8 nor 10,216 anyway because if I look at x$kqfco (the list of x$ columns) for x$ksuse I see the following):

select
        co.indx,
        co.kqfconam     column_name,
        co.kqfcosiz     column_size,
        co.kqfcooff     column_offset
from
        x$kqfta ta,
        x$kqfco co
where
        co.kqfcotab = ta.indx
and     ta.kqftanam = 'X$KSUSE'
order by
        co.kqfcooff,
        co.indx
/

      INDX COLUMN_NAME                      COLUMN_SIZE COLUMN_OFFSET
---------- -------------------------------- ----------- -------------
       503 ADDR                                       8             0
       504 INDX                                       4             0
       505 INST_ID                                    4             0
       506 CON_ID                                     2             0
       507 KSSPAFLG                                   4             0
       508 KSSPAOWN                                   8             0
...
       603 KSUSELTXIDFLG                              4          9636
       604 KSUSELTXID                                 8          9648
       615 KSUSEFLG3                                  4         10000

According to this the last column of the row starts at byte 10,000 and ends at 10,003 giving us a third possible length for the row and a gap of 212 bytes to explain; and even though there are several x$ tables starting at the same address (x$ksuse, x$ksusio, x$lsusecst, x$ksuru, x$ksusm) on the same internal structure none of them has a column definition explaining the gap.

It looks as if any further investigation (of x$ksuse at least) will have to depend on dumping memory memory to trace files and chasing pointers.

Summary

When you try to find out how big a “row” in an x$ structure is you may find several pieces of contradictory information:

  1. The rowsize (kqftarsz) reported in x$kqfta.
  2. The offset + length of the final column in the description in x$kqfco.
  3. The “gap” between consecutive rows when you query addr in the object itself

Using oradebug to peek a couple of rows may help you uncover the truth. But you may end up chasing pointers to pointers to pointers to get to the ultimate source.

2 Comments »

  1. […] X$ row sizes (Mar 2022): Simple script to get some clue about rows in x$ structures […]

    Pingback by Infrastructure Catalogue | Oracle Scratchpad — March 21, 2022 @ 11:15 am GMT Mar 21,2022 | Reply

  2. […] Three ways of checking row sizes – which may give different results: now published […]

    Pingback by Excavating x$ | Oracle Scratchpad — March 21, 2022 @ 11:22 am GMT Mar 21,2022 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

%d bloggers like this: