Oracle Scratchpad

May 20, 2022

Quiz Night

Filed under: Infrastructure,Oracle,Problem Solving,Statistics — Jonathan Lewis @ 6:09 pm BST May 20,2022

It’s a long time since I’ve done a quiz night – but here’s something that fooled me (briefly) when it appeared on the Oracle Developers’ Forum. Here’s a table definition – and I’m not going to make it easy by giving you a “create table” statement – but it’s just a simple heap table:

sql> desc interr_skuplannparam
 name                             null?    type
 -------------------------------- -------- -------------------
 atpdur                                    number(38)
 depdmdopt                                 number(38)
 externalskusw                             number(1)
 firstreplendate                           date
 lastfrzstart                              date
 lastplanstart                             date
 plandur                                   number(38)
 planleadtime                              number(38)
 planleadtimerule                          number(38)
 planshipfrzdur                            number(38)
 restrictdur                               number(38)
 allocbatchsw                              number(1)
 cmpfirmdur                                number(38)
 custservicelevel                          float(126)
 maxchangefactor                           float(126)
 mfgleadtime                               number(38)
 recschedrcptsdur                          number(38)
 cpppriority                               number(38)
 cpplocksw                                 number(1)
 criticalmaterialsw                        number(1)
 aggexcesssupplyrule                       number(38)
 aggundersupplyrule                        number(38)
 bufferleadtime                            number(38)
 maxoh                                     float(126)
 maxcovdur                                 number(38)
 drpcovdur                                 number(38)
 drpfrzdur                                 number(38)
 drprule                                   number(38)
 drptimefencedate                          date
 drptimefencedur                           number(38)
 incdrpqty                                 float(126)
 mindrpqty                                 float(126)
 mpscovdur                                 number(38)
 mfgfrzdur                                 number(38)
 mpsrule                                   number(38)
 mpstimefencedate                          date
 mpstimefencedur                           number(38)
 incmpsqty                                 float(126)
 minmpsqty                                 float(126)
 shrinkagefactor                           number(38)
 item                                      varchar2(50 char)
 loc                                       varchar2(50 char)
 expdate                                   date
 atprule                                   number(38)
 prodcal                                   varchar2(50 char)
 prodstartdate                             date
 prodstopdate                              date
 orderingcost                              float(126)
 holdingcost                               float(126)
 eoq                                       float(126)
 ff_trigger_control                        number(38)
 workingcal                                varchar2(50 char)
 lookaheaddur                              number
 orderpointrule                            number
 orderskudetailsw                          number(1)
 supsdmindmdcovdur                         number(38)
 orderpointminrule                         number(38)
 orderpointminqty                          float(126)
 orderpointmindur                          number(38)
 orderuptolevelmaxrule                     number(38)
 orderuptolevelmaxqty                      float(126)
 orderuptolevelmaxdur                      number(38)
 aggskurule                                number(38)
 fwdbuymaxdur                              number(38)
 costuom                                   number(38)
 cumleadtimedur                            number(38)
 cumleadtimeadjdur                         number(38)
 cumleadtimerule                           number(38)
 roundingfactor                            float(126)
 limitplanarrivpublishsw                   number(1)
 limitplanarrivpublishdur                  number
 maxohrule                                 number(1)
 integration_stamp                         date
 integration_jobid                not null varchar2(32 char)
 error_str                                 varchar2(2000 char)
 error_stamp                               date

The column integration_jobid (the single “not null” column) has been defined with the default value of “INT_JOB”, which takes 7 bytes to store. What’s the result of the query at the end of this little script:

truncate table interr_skuplannparam;

insert into interr_skuplannparam (atpdur) 
select   0 
from     all_objects 
where    rownum <= 10000
/

commit;

execute dbms_stats.gather_table_stats(user,'interr_skuplannparam')

select  avg_row_len 
from    user_tables
where   table_name = 'INTERR_SKUPLANNPARAM' -- added later
;

Hint: the value zero is represented internally as a single byte holding the value 0x80 (decimal 128).

Answer

It’s fairly common knowledge that Oracle includes a “length byte” in the estimates for average column length and average row length. It’s also fairly well known that “trailing nulls take no space” in a row so don’t need a length byte. There are a couple of “less common knowledge” details to add, though:

  1. any column that is null in every row gets a zero for the avg_col_len even if it isn’t a “trailing null”.
  2. but point 1 doesn’t apply to date columns that are always null, they get a 1 for the avg_col_len even the column is a “trailing null”.. The same is true for the various timestamp and interval types.
  3. for columns that hold at least one value the avg_col_len is the average over all rows of the actual space used by that column’s data, rounded up, plus 1 for the length byte.
  4. the avg_row_len is not the sum(avg_col_len) it is based on the average of the summed column lengths for each row, plus the count of the length bytes recorded.
  5. User defined type, LOBs, varray types etc. introduce all sorts of other strange effects. (But that’s probably “more common” knowledge.

So what does that mean in my example where there’s a declared not null column near the end of the row, with two trailing columns and with every column except the first and the non-null column set to null for every single row in the table? The easy option is to create the model and show you the results of querying user_tab_cols.

break on report 
compute sum of avg_col_len on report

select  avg_row_len 
from    user_tables
where   table_name = 'INTERR_SKUPLANNPARAM'
;

select 
        column_name, data_type, avg_col_len 
from
         user_tab_cols 
where 
        table_name = 'INTERR_SKUPLANNPARAM'
and     avg_col_len != 0
order by
        column_id
;


============

AVG_ROW_LEN
-----------
         20

1 row selected.


 COLUMN_ID SEGMENT_COLUMN_ID COLUMN_NAME              DATA_TYPE                 AVG_COL_LEN
---------- ----------------- ------------------------ ------------------------- -----------
         1                 1 ATPDUR                   NUMBER                              2
         4                 4 FIRSTREPLENDATE          DATE                                1
         5                 5 LASTFRZSTART             DATE                                1
         6                 6 LASTPLANSTART            DATE                                1
        29                29 DRPTIMEFENCEDATE         DATE                                1
        36                36 MPSTIMEFENCEDATE         DATE                                1
        43                43 EXPDATE                  DATE                                1
        46                46 PRODSTARTDATE            DATE                                1
        47                47 PRODSTOPDATE             DATE                                1
        73                73 INTEGRATION_STAMP        DATE                                1
        74                74 INTEGRATION_JOBID        VARCHAR2                            8
        76                76 ERROR_STAMP              DATE                                1
                                                                                -----------
sum                                                                                      20

12 rows selected.


My query of user_tab_cols orders by column_id, technically it should order by segment_column_id to show the physical ordering in the data segment to allow for all the strange effects you can get in more complex scenarios, but in this very simple case the two values are the same.

As you can see, every date type (including the trailing error_stamp) has an avg_col_len of 1, even though all the dates are null in every row. Column atdpur has avg_col_len = 2, which is 1 byte for storing zero plus a length byte and integration_job_id has avg_col_len = 8, which is 7 bytes for storing ‘INT_JOB’ plus a length byte.

In this case where every single row is identical there are no rounding effects due to the calculation of average column length (the column data stored is the same in every row) so the avg_row_len = sum(avg_col_len).

Conclusion

This was an extreme (though real) production) case. I wrote this note because someone was asking why there was such a massive difference between their estimate of the space needed for a table and the actual space taken by the table.

There are many reasons why this could happen, of course, but in their case their estimate was based on multiplying the avg_row_len by the number of rows in the table – and they had a large number of columns that were always null but still required space for the length byte, so the avg_row_len was about half the actual stored space for the average row. (And then the avg_row_len doesn’t allow for the row overhead and row directory entry, which total another 5 bytes per row).

Lagniappe

It wasn’t part of the quiz, but once you’ve got a simple model you can always start playing with it to find out a little more, and to check how well your guesswork holds up as you make the scenario more complicated. For example, let’s update a few columns in just one row of the table, gather stats again, and report the row and column lengths.

update interr_skuplannparam
set 
        shrinkagefactor = 1234567890,
        item            = 'xxx',
        expdate         = sysdate
where
        rownum = 1
/

commit;

execute dbms_stats.gather_table_stats(user,'interr_skuplannparam')

select  avg_row_len 
from    user_tables
where   table_name = 'INTERR_SKUPLANNPARAM'
;

select 
        column_id, segment_column_id, column_name, data_type, avg_col_len 
from
        user_tab_cols 
where 
        table_name = 'INTERR_SKUPLANNPARAM'
and     avg_col_len != 0
order by
        column_id
;

============================

AVG_ROW_LEN
-----------
         22

1 row selected.


 COLUMN_ID SEGMENT_COLUMN_ID COLUMN_NAME              DATA_TYPE                 AVG_COL_LEN
---------- ----------------- ------------------------ ------------------------- -----------
         1                 1 ATPDUR                   NUMBER                              2
         4                 4 FIRSTREPLENDATE          DATE                                1
         5                 5 LASTFRZSTART             DATE                                1
         6                 6 LASTPLANSTART            DATE                                1
        29                29 DRPTIMEFENCEDATE         DATE                                1
        36                36 MPSTIMEFENCEDATE         DATE                                1
        40                40 SHRINKAGEFACTOR          NUMBER                              2
        41                41 ITEM                     VARCHAR2                            2
        43                43 EXPDATE                  DATE                                2
        46                46 PRODSTARTDATE            DATE                                1
        47                47 PRODSTOPDATE             DATE                                1
        73                73 INTEGRATION_STAMP        DATE                                1
        74                74 INTEGRATION_JOBID        VARCHAR2                            8
        76                76 ERROR_STAMP              DATE                                1
                                                                                -----------
sum                                                                                      25


The total of the avg_col_len has gone up from 20 to 25 – this is two bytes each for the shrinkage_factor and item columns (a tiny average for the stored data, plus 1 for a length byte), and one extra byte for the expdate column (a tiny average for the stored data). All three values rounded up from “1 and a bit” to 2.

The avg_row_len, however, has gone up by only 2 – which I am going to assume is the two newlength bytes, ,and with no allowance for the impact of the one row in 10,000 that is now a few bytes longer. It looks as if the rounding rules for the row length may be different from the rounding (up) rules for the column length.

May 4, 2022

redefinition error

Filed under: Bugs,Infrastructure,Oracle — Jonathan Lewis @ 12:22 pm BST May 4,2022

Here’s a note about a data error generated by using (possibly mis-using) the dbms_redefinition package (jump to conclusion). The original code to demonstrate the problem comes from a note on the Oracle Developer forum, and was brought to my attention by a tweet from Daniel Stein.

The critical feature of the demo is that we can end up with a column containing nulls despite being declared NOT NULL (and I don’t mean by that a simple “is not null” check constraint – which is not quite the same as a NOT NULL declaration).

Here’s the first part of a script, mostly copied from the forum post, that I’ve I’ve been running on 19.11.0.0:

rem
rem     Script:         redef_bug.sql
rem     Author:         Jonathan Lewis / Sebastian (User_6AT2M)
rem     Dated:          May 2022
rem
rem     Last tested 
rem             19.11.0.0
rem

create table test_nulls (
        id      number (8)      constraint tn_pk primary key,
        text    varchar2 (25)   constraint tn_nn_tx not null
)
/

create unique index i_test_nulls on test_nulls(text);

insert into test_nulls (id, text) 
select rownum, 'some text '||rownum from dual connect by level <= 50;

-- create an empty copy, without the constraints or indexes

create table test_nulls_interim (
        id      number (8),
        text    varchar2 (25)
);

begin
        dbms_redefinition.start_redef_table(
                uname           => user,
                orig_table      => 'test_nulls',
                int_table       => 'test_nulls_interim',
                col_mapping     => 'id id, substr(text, id, 2) text'
        );
end;
/

The script creates a table with a primary key declared on an id column, a not null declaration on a text column and a unique index on the text column, then populates the table with 50 rows that have the form (N, “some text N”) where N is a number between 1 and 50.

After creating an empty copy of the table with no constraints or indexes I start an online redefinition – modifying the content of the text column as part of the redefinition. If you check the col_mapping carefully you will realise that when id reaches 13 the result from the substr() function becomes null.

This is where the trouble starts. If I now call dbms_redefition.copy_table_dependents() to add the original constraints and indexes to the interim table what’s Oracle going to do about the not null declaration on the text column?

declare
        error_ct pls_integer;
begin
        dbms_redefinition.copy_table_dependents(
                uname           => user,
                orig_table      => 'test_nulls',
                int_table       => 'test_nulls_interim',
                num_errors      => error_ct,
                copy_indexes    => 1,
                copy_constraints=> true,
                ignore_errors   => false,
--
                copy_triggers   => false,
                copy_privileges => false,
                copy_statistics => false,
                copy_mvlog      => false
        );  

        dbms_output.put_line('error count: '||error_ct);
end;
/

begin
        dbms_redefinition.finish_redef_table(user, 'test_nulls', 'test_nulls_interim');
end;
/

drop table TEST_NULLS_INTERIM purge;

I’ve exposed all the parameters to the copy_table_dependents() procedure call in my code, and you can see that I’ve chosen to copy only the constraints and indexes, and I don’t want to ignore errors.

The PL/SQL anonymous block terminates successfully, doesn’t report any errors, and outputs an error count of zero. So let’s see what we’ve got as the final result of the redefinition.

column search_condition_vc format a20
break on table_name skip 1 on index_name
set echo on

select
         table_name, index_name, column_name 
from 
        user_ind_columns 
order by 
        1,2,column_position
/

desc test_nulls

select count(*) from test_nulls;
select /*+ full(test_nulls) */ count(*) from test_nulls;

select
        constraint_name,
        constraint_type,
        search_condition_vc,
        status,
        deferrable,
        deferred,
        validated
from
        user_constraints
where
        table_name = 'TEST_NULLS'
/

set echo off

Here are the results – my login.sql has a lot of column format commands so your results may look a lot messier if you run this bit of code. Here’s the echoed output:

SQL> select
  2           table_name, index_name, column_name
  3  from
  4          user_ind_columns
  5  order by
  6          1,2,column_position
  7  /

TABLE_NAME                INDEX_NAME           COLUMN_NAME
------------------------- -------------------- --------------------
TEST_NULLS                I_TEST_NULLS         TEXT
                          TN_PK                ID


2 rows selected.

SQL> 
SQL> desc test_nulls
 Name                                                                     Null?    Type
 ------------------------------------------------------------------------ -------- -------------------------------------------------
 ID                                                                       NOT NULL NUMBER(8)
 TEXT                                                                     NOT NULL VARCHAR2(25)

SQL> 
SQL> select count(*) from test_nulls;

  COUNT(*)
----------
        12

1 row selected.

SQL> select /*+ full(test_nulls) */ count(*) from test_nulls;

  COUNT(*)
----------
        50

1 row selected.

SQL> 
SQL> select
  2          constraint_name,
  3          constraint_type,
  4          search_condition_vc,
  5          status,
  6          deferrable,
  7          deferred,
  8          validated
  9  from
 10          user_constraints
 11  where
 12          table_name = 'TEST_NULLS'
 13  /

CONSTRAINT_NAME      C SEARCH_CONDITION_VC  STATUS   DEFERRABLE     DEFERRED  VALIDATED
-------------------- - -------------------- -------- -------------- --------- -------------
TN_NN_TX             C "TEXT" IS NOT NULL   ENABLED  NOT DEFERRABLE IMMEDIATE VALIDATED
TN_PK                P                      ENABLED  NOT DEFERRABLE IMMEDIATE VALIDATED

Both indexes have arrived; both columns have NOT NULL declarations.

When you count the number of rows in the table it’s only 12 – unless you force a full tablescan in which case it’s 50. This happens because the NOT NULL declaration of column text allows the optimizer to use an index-only plan on the index i_test_nulls and there are 38 nulls in the table that don’t appear in the index.

The check on user_constraints shows that both the primary key constraint and the “is not null” check constraint are enabled and validated.

Conclusion

There is code in the copy_table_dependents() procedure that is defective, deficient and dangerous. Clearly I’ve done something that I probably shouldn’t have done (and, historically, I wouldn’t have done) but the code should still have protected me from an error that leaves the database in a state that is not internally consistent and can produce incorrect results.

In my example the copy_table_dependents() procedure has allowed me to add a NOT NULL declaration to a column after the col_mapping parameter of start_table_redef() has resulted in rows that have nulls in that column. There may be variants on this theme that allow other types of error to appear.

Notes

If you want to repeat this test and try some variations on the theme you will need a few non-trivial privileges (which can be granted through a role). For the basic dbms_redefinition package you will need: execute on dbms_redefinition, select any table, create any table, alter any table, lock any table, drop any table; and to execute the copy_table_dependents() procedure you will also need create any index, create any trigger.

The copy_table_dependents() procedure appeared (I think) in the 10g time-line; prior to that you had to sort out all the constraints and dependencies “by hand” – which you would probably manage online through the sync_interim_table() procedure (viz: “create an index, sync interim table, create next index, sync etc.) before calling the finish_redef_table() procedure. That being the case my immediate response to this issue was that if you don’t want the not null declaration on text then you can have to exclude the copy_constraints option when copying the table dependants; if you did want the not null declaration then you should have included it in the initial definition of the interim table because the start_redef_table() call would then have failed, raising:

ORA-12008: error in materialized view or zonemap refresh path
ORA-01400: cannot insert NULL into ("{schema}"."TEST_NULLS_INTERIM"."TEXT")

April 4, 2022

PLSQL_WARNINGS

Filed under: Infrastructure,Oracle,Tuning — Jonathan Lewis @ 10:45 am BST Apr 4,2022

I don’t suppose many people edit and compile PL/SQL code from the SQL*Plus command line any more, but the following reminder about avoiding mistakes is probably valid in other development environments even though it may be enabled through a difference mechanism.

You can set the level of warning that you get from the PL/SQL compiler – and the default value isn’t the best value to use if you want your PL/SQL to be efficient. Here’s a demonstration based on a common, trivial, but annoying error – it starts with an indexed table:

rem
rem     Script:         plsql_warning_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2017
rem
rem     Last tested 
rem             19.11.0.0
rem             11.2.0.4
rem

create table t1(
        id      varchar2(10),
        v1      varchar2(32)
);

create index t1_i1 on t1(id);

insert into t1 
select  rownum, object_name 
from    all_objects
where   rownum <= 1000
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

Note that although I’ve declared the id column as a varchar2() type I’m populating it with a number – that will, of course, be coerced into a character format. This is an example of something I’ve seen quite often in production systems – a character column storing something that (always) looks like a number – and it leads to the following type of coding defect:

create or replace function f1 (i_in number)
return varchar2
authid definer
as
        m_v1 t1.v1%type;
begin
        select  t1.v1
        into    f1.m_v1
        from    t1
        where   t1.id = f1.i_in ;

        return  f1.m_v1;

end;
/

show errors

The error should be quite easy to see in this example – but in a more complex example with a longer piece of code it might not be so visible. Still, I’ve got the “show errors” call immediately after my “create or replace” so that should help. Here’s the output from running the above:

Function created.

No errors.

There is an error in the code- but it’s not one that leads to a PL/SQL compilation error. My incoming parameter is numeric, and I’m using it in the predicate checking t1.id – which is an indexed character column – but that means the CBO will modify the predicate internally to: to_number(t1.id) = :b1, disabling the index. The function will work (provided the full tablescan doesn’t find a value in the table that raises a conversion error), but the performance probably won’t be what I’d hoped for.

Let’s recompile the procedure, but precede it with an alter session statement:

alter session set plsql_warnings = 'enable:all';
alter function f1 compile;
show errors

And here’s the resulting output:

Session altered.

SP2-0807: Function altered with compilation warnings

Errors for FUNCTION F1:

LINE/COL ERROR
-------- -----------------------------------------------------------------
10/8     PLW-07204: conversion away from column type may result in
         sub-optimal query plan

The compile succeeded (“function altered …warnings”), so the function will still execute and return the correct result, but we’ve had an explicit warning of exactly what we’ve done wrong and the effect it will have. But you can go one better – if you know your PLSQL error numbers:

alter session set plsql_warnings = 'enable:all','error:7204';
alter function f1 compile;
show errors

In this example I’ve added an extra option to the plsql_warnings parameter – I’ve told it to treat PLW-7204 as an error, not just as a warning, so this is what we see:

Session altered.


Warning: Function altered with compilation errors.

Errors for FUNCTION F1:

LINE/COL ERROR
-------- -----------------------------------------------------------------
10/8     PLS-07204: conversion away from column type may result in
         sub-optimal query plan

The difference doesn’t leap out and hit you in the eye, but instead of “with compilation warnings the error message says “with compilation errors. (It’s also lost the SP2-0807 prefix, but that’s not important.)

Since PLW-07024 is now an error the function state is invalid, and if you try using it in a query the query will raise Oracle error: ORA-06575: Package or function F1 is in an invalid state.

If you want to see how your packages, procedures and functions were last compiled you can query one of two views:

column param_name  format a32
column param_value format a32

select  param_name, param_value
from    user_stored_settings
where   object_name = 'F1'
and     object_type = 'FUNCTION'
/

PARAM_NAME                       PARAM_VALUE
-------------------------------- --------------------------------
plsql_optimize_level             2
plsql_code_type                  INTERPRETED
plsql_debug                      FALSE
nls_length_semantics             BYTE
plsql_warnings                   ENABLE:ALL,ERROR:  7204
plsql_ccflags
plscope_settings                 IDENTIFIERS:NONE
plsql_compiler_flags             INTERPRETED,NON_DEBUG

8 rows selected.

(The other view is a denormalized/pivoted version of this view, giving you one row and 8 columns for each object. [Ed: see comment #2, the reference manual for 21c says the view I’ve used above is deprecated and I should have shown you user_plsql_object_settings which is now (will be) the approved view])

If you want to make sure that you recompile the function with the same settings that you used previously you can add the clause “reuse settings” to the “alter function f1 compile” call; if you don’t do this the function will compile with whatever your current session settings (which may have been dictated by the system settings).

There are variations on this theme – if you check the “alter compile” syntax you’ll see that you can include “parameter = value” clauses in the call to compile so, for example, I could start a new session and issue:

alter function f1 compile plsql_warnings='error:7204';

This would give me exactly the same warning, and the same output on a subsequent “show errors” – though in this case the output from user_stored_settings would be: “DISABLE:ALL,ERROR: 7204”.

If you want a complete list of all the possible pl/sql warnings you can find them in $ORACLE_HOME/plsql/mesg/plwus.msg. The warnings fall into three categories: Severe, Informational, and Performance, and there is a short note in the message file giving the ranges:

/   SEVERE -- For this category the warning number should be between
/             5000 and 5999.
/   INFORMATIONAL - For this category the warning number should be between
/                   6000 and 6249.
/   PERFORMANCE   - For this category the warning number should be between
/                   7000 and 7249.

It’s worth taking a look at the possible errors – even if you don’t take advantage of the feature. There aren’t very many, but one I particularly like is Informational 6002: “Unreachable code”, which tells you exactly which bits of your PL/SQL are never going to run. (I wonder how many people would find code that failed to recompile if they set the plsql_warning to ‘error:6002’).

Summary

It’s worth knowing about the option to set the parameter plsql_warnings as it may avoid accidental inefficiency in SQL that’s embedded in PL/SQL, and it may highlight coding errors (like “unreachable code”) that otherwise manage to slip past test suites.

Personally I’d be inclined to set it to something quite aggressive on development to help developers spot errors as early and as easily as possible,

March 28, 2022

Drop column bug

Filed under: Bugs,Infrastructure,Oracle — Jonathan Lewis @ 11:29 am BST Mar 28,2022

In the previous note about a problem dropping virtual columns the “guilty party” that made it impossible to drop any columns was based on a complex data type owned by the MDSYS (Spatial) schema. This note demonstrates the same problem with a very simple example created from scratch in an ordinary user schema.

rem
rem     Script:         object_virtual_col.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2022
rem
rem     Last tested 
rem             19.11.0.0
rem

create type point_type as object(x_coord number, y_coord number);
/

create or replace function my_point(inpoint in point_type)
return point_type
deterministic as 
begin
        return inpoint;
end;
/

show errors

create table t1 (id, n1, p1, n2, v1, padding)
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        rownum                          n1,
        point_type(rownum, rownum)      p1,
        rownum                          n2,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1
where
        rownum <= 100   -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

alter table t1 add constraint t1_pk primary key(id);

So I’ve declared a type “point” which is an object with two attributes of type number, and I’ve created a function that takes a point as its input parameter and returns a point. Then I’ve created a table which includes a column of type point.

Let’s start with a little reminder of what a pain it is to use even simple object types correctly. What’s going to happen with the following three SQL statements:

select    p1.x_coord from t1    where rownum <= 4;
select t1.p1.x_coord from t1    where rownum <= 4;
select t1.p1.x_coord from t1 t1 where rownum <= 4;

The first two will fail – the first one shouldn’t be too surprising, the second does seem a little unreasonable:

ORA-00904: "P1"."X_COORD": invalid identifier
ORA-00904: "T1"."P1"."X_COORD": invalid identifier

So let’s try adding some virtual columns to pick out the X value:

alter table t1 add x_val generated always as (p1.x_coord) virtual;
alter table t1 add c_val generated always as (cast(p1.x_coord as number)) virtual;

The first call will fail (ORA-54016: Invalid column expression was specified) but the second will succeed. What if we try to hide out point column behind a call to our function:

alter table t1 add fp_val generated always as (my_point(p1)) virtual;
alter table t1 add fx_val generated always as (my_point(p1).x_coord) virtual;

Again the first call will fail (ORA-54004: resultant data type of virtual column is not supported) but that’s a documented restriction – a user-defined type may not be used as the type of a virtual column and I wasn’t at that point trying to return just the one attribute.

The second call, however, will succeed. So I can’t create a virtual column p1.x_coord, but I can create a virtual column my_point(p1).x_coord.

We now have two virtual columns that should return the required values, so that’s do a quick check with a couple of simple queries – cut and paste:

SQL> select fx_val "my_point(p1).x_coord" from t1 where rownum <= 4;

my_point(p1).x_coord
--------------------
                   1
                   2
                   3
                   4

4 rows selected.

SQL> select c_val  "cast(p1.x_coord as -" from t1 where rownum <= 4;

cast(p1.x_coord as -
--------------------
                   1
                   2
                   3
                   4

4 rows selected.

Finally we’ll finish off by demonstrating that I’ve just created a problem that no-one will notice until long after I’ve left the site (maybe):

SQL> alter table t1 drop column n1;
alter table t1 drop column n1
 *
ERROR at line 1:
ORA-00904: "TEST_USER"."MY_POINT": invalid identifier

After creating (and using successfully) the virtual column that calls my function, I can’t drop any of the columns in the table.

Summary

The manuals have a stated restriction for virtual columns that they cannot be a user-defined type, and this restriction seems to carry forward to an attribute of a user-defined type unless the attribute has been cast() to a base type.

The same restriction seems to apply to functions returning a user-defined type, but not to the individual attributes of the returned value – it is not necessary to cast() them to a base type. However, if you (accidentally) take advantage of this relaxation of the restriction you will be unable to drop any columns from the table in the future.

March 24, 2022

Drop column bug

Filed under: Bugs,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 11:07 am GMT Mar 24,2022

Here’s a problem that appeared recently on the Orace Developer forum showing one of the classic symptons of new features namely that “mix and match” often runs into problems. This example has been a long time coming so “new” is something of a misnomer, but the alternative source of unlucky collisions is in the “rare” features – in this case Spatial. (It’s possible that the problem is not restricted to just Spatial but it probably does require a somewhat exotic data type.)

The problem appeared in a thread on the Oracle Developer Forum from someone who was trying to drop a pair of columns and finding that the statement failed with a surprising choice of error: ORA-00904: Invalid Identifier. The surprising thing about this error was that the named identifier was clearly not invalid. The suspicion that that this was an example of “new features colliding” was that the columns to be dropped were virtual columns based on a real column of the table that had been declared as an object type defined in the MDSYS (Spatial) schema.

Conveniently the author had supplied a short, simple, script to demonstrate the issue, so I copied it and modified it a bit to do a few tests around the edges of the problem. Here’s the code that I used to start my investigation:

rem
rem     Script:         drop_col_bug.sql
rem     Author:         Jonathan Lewis/User_77G7L
rem     Dated:          Mar 2022
rem     Purpose:        
rem
rem     Last tested 
rem             21.3.0.0        Still broken
rem             19.11.0.0
rem

create table xxx (
        v1              varchar2(10),
        n1              number,
        shape           mdsys.sdo_geometry,
        x_value         number generated always as (mdsys.sdo_geom.sdo_pointonsurface(shape,0.005).sdo_point.x) virtual,
        y_value         number generated always as (mdsys.sdo_geom.sdo_pointonsurface(shape,0.005).sdo_point.y) virtual,
        v2              varchar2(10),
        n2              number,
        n3              number
)
segment creation immediate
;

insert into xxx(v1, n1, v2, n2, n3) values('z',1,'y',2,3);
update xxx  set
        shape = sdo_geometry(
                2003,                           -- two-dimensional polygon
                null,
                null,
                sdo_elem_info_array(1,1003,3),  -- one rectangle (1003 = exterior)
                sdo_ordinate_array(1,1, 5,7)    -- only 2 points needed to define rectangle
        )
;


commit;

alter table xxx drop (x_value, y_value) ;

The modifications I made from the original code are:

  • I’ve removed a couple of redundant sets of parentheses from the virtual column definitions
  • I’ve added a few columns before, in between, and after the virtual columns
  • I’ve used “segment creation immediate”
  • I’ve inserted a row into the table

The last two are simply to ensure that I have data segments and at least one item for the drop to work on – just in case it’s a run-time problem being reported as a parse time issue.

The extra columns are to test whether the type and position of the column I drop affects the outcome, and the change in parentheses is purely aesthetic.

Here’s the result of the attempt to drop the virtual columns:

alter table xxx drop (x_value, y_value)
  *
ERROR at line 1:
ORA-00904: "MDSYS"."SDO_GEOM"."SDO_POINTONSURFACE": invalid identifier

This is a little strange since I have used the packaged function mdsys.sdo_geom.sdo_pointonsurface() to define the virtual columns and Oracle didn’t complain when I created the column. (Part of the reason I had reduced the original parentheses was to check that the compiler hadn’t got confused by an excess of paretheses).

As a quick “what if” test I tried using the alternative syntax for drop column that you can use with just one column:

SQL> alter table xxx drop column x_value;
alter table xxx drop column x_value
  *
ERROR at line 1:
ORA-00904: "MDSYS"."SDO_GEOM"."SDO_POINTONSURFACE": invalid identifier

What about trying to set the column unused before dropping all unused columns?

SQL> alter table xxx set unused column x_value;
alter table xxx set unused column x_value
  *
ERROR at line 1:
ORA-00904: "MDSYS"."SDO_GEOM"."SDO_POINTONSURFACE": invalid identifier

So is the problem restricted to the virtual columns – what happens if I try to drop a column from the end of the table, what about the one between the two virtual columns, how about a column that appears before even the shape column? Nothing changes:

SQL> alter table xxx drop column v1;
alter table xxx drop column v1
  *
ERROR at line 1:
ORA-00904: "MDSYS"."SDO_GEOM"."SDO_POINTONSURFACE": invalid identifier


SQL> alter table xxx set unused column v1;
alter table xxx set unused column v1
  *
ERROR at line 1:
ORA-00904: "MDSYS"."SDO_GEOM"."SDO_POINTONSURFACE": invalid identifier

What if I have only one of the virtual columns? No difference.

What if I don’t have either of the virtual columns? Finally I can drop any column I like from the table (including the shape column). Not that that’s much use to the user.

You can’t set unused or drop any columns in the table thanks to an error that looks as if it’s associated with the definition of those virtual columns.

Workaround

Is there any way to bypass the problem and still store the information we need (until we want to drop it). Let’s start by taking a look at the way Oracle has used our table definition to create column definitions, just in case that gives us a clue:

select 
        column_id id, segment_column_id seg_id, internal_column_id int_id, 
        column_name, data_type, data_default  
from 
        user_tab_cols 
where 
        table_name = 'XXX' 
order by 
        column_id,
        internal_column_id
/

        ID     SEG_ID     INT_ID COLUMN_NAME          DATA_TYPE                 DATA_DEFAULT
---------- ---------- ---------- -------------------- ------------------------- --------------------------------------------------------------------------------
         1          1          1 V1                   VARCHAR2
         2          2          2 N1                   NUMBER
         3          3          3 SHAPE                SDO_GEOMETRY
         3          4          4 SYS_NC00004$         NUMBER
         3          5          5 SYS_NC00005$         NUMBER
         3          6          6 SYS_NC00006$         NUMBER
         3          7          7 SYS_NC00007$         NUMBER
         3          8          8 SYS_NC00008$         NUMBER
         3          9          9 SYS_NC00009$         SDO_ELEM_INFO_ARRAY
         3         10         10 SYS_NC00010$         SDO_ORDINATE_ARRAY
         4                    11 X_VALUE              NUMBER                    (("MDSYS"."SDO_GEOM"."SDO_POINTONSURFACE"("SHAPE",0.005))."SDO_POINT")."X"
         5                    12 Y_VALUE              NUMBER                    (("MDSYS"."SDO_GEOM"."SDO_POINTONSURFACE"("SHAPE",0.005))."SDO_POINT")."Y"
         6         11         13 V2                   VARCHAR2
         7         12         14 N2                   NUMBER
         8         13         15 N3                   NUMBER

15 rows selected.

There’s quite a lot going on there in terms of columns hidden behind the sdo_geometry type. In fact internal columns 9 and 10 might prompt you to look for other objects like table types or LOBs:

SQL> select column_name, segment_name, index_name from user_lobs where table_name = 'XXX';

COLUMN_NAME                    SEGMENT_NAME                   INDEX_NAME
------------------------------ ------------------------------ ------------------------------
"SHAPE"."SDO_ELEM_INFO"        SYS_LOB0000100168C00009$$      SYS_IL0000100168C00009$$
"SHAPE"."SDO_ORDINATES"        SYS_LOB0000100168C00010$$      SYS_IL0000100168C00010$$

2 rows selected.

But the interesting detail is the data_default column for our two virtual columns – which have more parentheses than the original definitions. Perhaps the storage of the expression has gone wrong (as happened in an older version of Oracle with case expressions) and is causing the ORA-00904 error to appear. So let’s try selecting data from the table using the expression stored in data dictionary:

select
        ((MDSYS.SDO_GEOM.SDO_POINTONSURFACE(SHAPE,0.005)).SDO_POINT).X          old_x,
          mdsys.sdo_geom.sdo_pointonsurface(shape,0.005).sdo_point.x            new_x,
        ((MDSYS.SDO_GEOM.SDO_POINTONSURFACE(SHAPE,0.005)).SDO_POINT).Y          old_y
from
        xxx
/

     OLD_X      NEW_X      OLD_Y
---------- ---------- ----------
         1          1          1

1 row selected.


No syntax error there – as far as a simple select is concerned. I’ve included my tidier format for the x_value column aligned with the resulting stored value (with all the double quotes removed – though I’ve also tested it with the quotes in place) – and the only significant visual difference is the number of parentheses, so maybe that’s a clue. In particular we note that the error reports “MDSYS”.”SDO_GEOM”.”SDO_POINTONSURFACE” as the invalid identifier and the first time an extra (close) parenthesis appears is just after that function call. Maybe (for no good reason) the code path involved with handling column data during a drop/set unused call is getting confused by parentheses. So let’s try to reduce the complexity of the expression by hiding it inside a local function.

First attempt – create a function to return an sdo_point_type and define the virtual columns to expose the X and Y values from the point:

create or replace function my_point(
        inshape         in mdsys.sdo_geometry,
        intolerance     in number
)
return mdsys.sdo_point_type
deterministic
as
begin
        return mdsys.sdo_geom.sdo_pointonsurface(inshape, intolerance).sdo_point;
end;
/

        x_value         number generated always as (my_point(shape,0.005).x) virtual,
        y_value         number generated always as (my_point(shape,0.005).y) virtual,

This approach still produces an ORA-00904, though the invalid identifier becomes “TEST_USER”.”MY_POINT”.

Second attempt – two separate functions, one for the x value, one for the y value:

create or replace function my_x(
        inshape         in mdsys.sdo_geometry,
        intolerance     in number
)
return number
deterministic
as
begin
        return   mdsys.sdo_geom.sdo_pointonsurface(inshape, intolerance).sdo_point.x;
end;
/

show errors

create or replace function my_y(
        inshape         in mdsys.sdo_geometry,
        intolerance     in number
)
return number
deterministic
as
begin
        return   mdsys.sdo_geom.sdo_pointonsurface(inshape, intolerance).sdo_point.y;
end;
/

        x_value         number generated always as (my_x(shape, 0.005)) virtual,
        y_value         number generated always as (my_y(shape, 0.005)) virtual,

This worked so, finally, I looked at the SQL Language reference manual to see if there were any restrictions on virtual columns that might explain the problem I had had with all the previous definitions (and, yes, I know I should have done that right after the first failure) and I found the following:

  • The virtual column cannot be an Oracle supplied data type, a user-defined type, or LOB or LONG RAW.

None of my virtual column definitions returned an Oracle supplied data type or a user-defined data type. But would the restriction also apply to single attributes of such a data type, or has the complexity of spatial data types found a hole in the code? And the answer to that question is a whole new blog note waiting to be finish (because I’d forgotten what a pain it was to use object types in simple SQL.)

March 21, 2022

v$_fixed_view_definition

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 7:08 pm GMT Mar 21,2022

In one of my notes about excavating the x$ tables I said that there was a problem “hidden in plain sight” with the dynamic performance view (g)v$fixed_view_definition. If you haven’t previously spotted the problem here’s what the view looks like when you describe it:

SQL> desc gV$fixed_view_definition
 Name                                       Null?    Type
 ----------------------------------------- -------- ----------------------------
 INST_ID                                            NUMBER
 VIEW_NAME                                          VARCHAR2(128)
 VIEW_DEFINITION                                    VARCHAR2(4000)
 CON_ID                                             NUMBER

and if that doesn’t make you suspicious, here’s a clue:

SQL> select  substr(view_definition,3950,50) tail_end
  2  from    gv$fixed_view_definition
  3  where   view_name = 'GV$SESSION'
  4  /

TAIL_END
--------------------------------------------------------------------------------
tand(s.ksuseflg2,64),64,'TRUE','FALSE'),decode(bit

Once upon a time the dynamic performance views were much simpler and their definitions would all have fitted inside a varchar2(4000) – but those were the good old days and things have moved on since then.

The view gv$session is one of the biggest offenders as you can’t even get to the list of x$ structures in the from clause into the first 4,000 bytes. If you can’t see the entire definition it can be a little difficult to work out why a query against one of these views is performing badly; it is still possible, of course, but harder than it needs to be.

If you need to work around this issue one strategy would be ask Oracle where it’s hidden the view definition in shared memory. It’s not the easiest option but it may be the only one available to you. You start by modifying the defining query for gv$fixed_view_definition into something that will report the address of the row in x$kqfvt that holds the view text:

select 
        i.addr, t.addr 
from 
        x$kqfvi i, 
        x$kqfvt t
where 
        i.kqfvinam = 'GV$SESSION'
and     t.indx = i.indx
/

ADDR             ADDR
---------------- ----------------
00000000143490A0 00000000143660C0

If you use the x_rowsize.sql script I published a little while ago to see what it says about the lengths of x$kqfvt rows you’ll find that the row length is 32 bytes, though if you check the definition of x$kqfvt using the x_desc.sql script from the original “excavating x$” notes you find the following:

Column                      Ext Type   Int Type  Array Max   Len Size Len Offset   Col Size     Offset      Index    Idx Col
------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
ADDR                              23          9          0          0          0          8          0          1          0
INDX                               2         11          0          0          0          4          0          2          0
INST_ID                            2         11          0          0          0          4          0          0          0
CON_ID                             2         11          0          0          0          2          0          0          0
KQFTPSEL                           1          6          0          0          0       4000          0          0          0

The only real column in the structure is KQFTPSEL and the “type” columns tell us that it is a pointer (8 bytes) to a string of 4000 bytes – which makes the 32 byte gap between rows a little odd, so let’s “peek” the 32 bytes starting at address 0x143660C0 to see what else (if anything) we can find in that row:

SQL> oradebug setmypid
Statement processed.

SQL> oradebug peek 0x143660C0 32 
[0143660C0, 0143660E0) = 14598D40 00000000 143754A0 00000000 00000000 00000000 00000000 00000000

That looks like 2 addresses (of 8 bytes each) and 16 null bytes which I can’t explain. The first address should get us to the view text, and I’m curious about the second address, so I’ll do a couple of large peeks for these two addresses, and dump the results into my session trace file by appending a ‘1’ to the command:

SQL> oradebug peek 0x14598D40 256 1
[014598D40, 014598E40) = 656C6573 73207463 736E692E 64695F74 612E732C 2C726464 6E692E73 732C7864 75736B2E 65736573 2E732C72 7575736B 73657364 6B2E732C ...

SQL> oradebug peek 0x143754A0 256 1
[0143754A0, 0143755A0) = 00000007 00000000 139779D0 00000000 00000005 00000000 1455A464 00000000 00000003 00000000 14BD2FAC 00000000 00000007 00000000 ...

The first peek looks promising – the hex values are in the “ASCII alphanumerics” range and reading each block of 4 bytes from right to left we get: ‘sele’ ‘ct t’ ‘.ins’ ‘t_id’ which, when we string it together starts us off with “select inst_id” which is the start of the view text.

The second peek looks as if it’s reporting pairs of (length of string, pointer to string) so I peeked at the first few addresses and found the strings: “INST_ID“, “SADDR“, “SID“, “SERIAL#” which looks like the first few names of the columns in gv$session.

Of course we haven’t yet seen anything telling us the length of the view text or the number of columns in the view – but maybe that’s in the x$kqfvi structure, and you’ll notice that I included the address (addr) of from that structure in my query. Here’s what I get from peekin the 80 bytes (the row size indicated by x_rowsize.sql) from that address:

0143490A0 0000000A 00000000 145528BC 00000000  [.........(U.....]
0143490B0 00000004 00000000 12F5D384 00000000  [................]
0143490C0 00000003 00000000 00000000 00000000  [................]
0143490D0 0000119F 00000000 00000068 00000000  [........h.......]
0143490E0 FFFFC15A 00000022 00000000 00000000  [Z..."...........]

The first line is a pointer to the character string “GV$SESSION” (0xA is the length), the second line is a point to the character string “NULL” (0x4 is the length) – but I don’t know why that’s there (yet). I don’t know what the third line is about. The fourth line holds the numbers 4511 (dec) and 104 (dec). Since the latter is the number of columns in gv$session would anyone cater to guess that the former is the length of the view text) The last line reports the kqfviobj and kqfviver as seen in the view definition.

The interesting question is whether we can find the full text of the view at the address we peeked because it’s possible that the presentation of the view definition has simply been stopped at the 4,000 bytes mark when it does actually continue in memory. So let’s do a bigger peek of the first address we found in x$kqfvt and check the trace file:

SQL> oradebug peek 0x14598D20 4600 1
[014598D20, 014599F18) = 74736E49 65636E61 00202927 00000000 00000000 00000000 00000000 00000000 656C6573 73207463 736E692E 64695F74 612E732C 2C726464 ...

SQL> ed or19_ora_28809.trc 

I’ve actually started my peek 32 (0x20) bytes earlier than the target address because of the way that Oracle fails to format the first few peeked bytes in the trace file. By peeking a little early I’ll get the whole section that I want to see formatted into readability:

Processing Oradebug command 'peek 0x14598D20 4600 1'
[014598D20, 014599F18) = 74736E49 65636E61 00202927 00000000 00000000 ...
Dump of memory from 0x014598D34 to 0x014599F18
014598D30          00000000 00000000 00000000      [............]
014598D40 656C6573 73207463 736E692E 64695F74  [select s.inst_id]
014598D50 612E732C 2C726464 6E692E73 732C7864  [,s.addr,s.indx,s]
014598D60 75736B2E 65736573 2E732C72 7575736B  [.ksuseser,s.ksuu]
014598D70 73657364 6B2E732C 65737573 2C6F7270  [dses,s.ksusepro,]
014598D80 736B2E73 6C647575 732C6975 75736B2E  [s.ksuudlui,s.ksu]
...
014599E50 20292745 6D6F7266 6B247820 65737573  [E') from x$ksuse]
014599E60 202C7320 736B2478 2064656C 78202C65  [ s, x$ksled e, x]
014599E70 6C736B24 77207477 65687720 62206572  [$kslwt w where b]
014599E80 6E617469 2E732864 7073736B 676C6661  [itand(s.ksspaflg]
014599E90 2129312C 6120303D 6220646E 6E617469  [,1)!=0 and bitan]
014599EA0 2E732864 7375736B 676C6665 2129312C  [d(s.ksuseflg,1)!]
014599EB0 6120303D 7320646E 646E692E 2E773D78  [=0 and s.indx=w.]
014599EC0 776C736B 64697374 646E6120 6B2E7720  [kslwtsid and w.k]
014599ED0 74776C73 3D747665 6E692E65 00007864  [slwtevt=e.indx..]
014599EE0 656C6573 20207463 44444153 202C2052  [select  SADDR , ]
014599EF0 20444953 4553202C 4C414952 202C2023  [SID , SERIAL# , ]
014599F00 53445541 2C204449 44415020 2C205244  [AUDSID , PADDR ,]
014599F10 45535520 2C202352                    [ USER# ,]

I got a little lucky with my guestimate of how much to peek and, as you can see I’ve got the whole of the view definition (which means I can now see all the tables in the from clause and all the predicates in the where clause).

I can now do some tedious work to edit the trace file to extract the full view and make it look pretty. On the other hand you may think that’s too much like hard work so, as an alternative, you might note that the view definition references x$ksuse, x$ksled and x$kslwt in that order, which means you might get lucky if you just search the executable to see if you can find those three words very close to each other:

[oracle@linux183 trace]$ cd $ORACLE_HOME/bin

[oracle@linux183 bin]$ strings -a oracle | grep -n "x\$ksuse.*x\$ksled.*x\$kslwt"  >temp.txt

[oracle@linux183 bin]$ wc temp.txt
   1  142 4519 temp.txt

We’ve found one line of 4,511 characters (plus 8 due to the -n option) matching our grep search criteria – and it’s the full text of the view.

If you want to find which other views lose the ends of their definitions a simple query will tell you:

set pagesize 60
set linesize 100
set trimspool on
set tab off

column view_name format a32
column view_definition format a60 wrap word

select  view_name, view_definition
from    v$fixed_view_definition
where   length(view_definition) = 4000
/

There were 18 of them in my 19.11.0.0 instance.

Summary

If you’re after the select statements defining any of the dynamic performance views they can be found in the view (g)v$fixed_view_definition. Unfortunately this will report only the first 4,000 bytes of the definition as will x$kqfvt, the x$ structure that exposes the defining text. (This doesn’t change, by the way, if you have set max_string_size to EXTENDED.)

The 4,000 byte limit is an artificial limit imposed by the limits on the external varchar2() type, and the information in the x$kqfvt structure actually points to a string holding the entire view definition. We can use oradebug to peek at the contents of a row in x$kqfvt using the addr column as the target address for peeking. The first 8 bytes will report the address where the view text is located, so we can use oradebug again to peek at that address (less 32 to get the whole string converted to character format) to tell Oracle to dump the entire string into the trace file.

Since it’s likely to be a tedious job to stitch the view text into a single piece from the 16 byte fragments dumped by Oracle you could use the strings o/s command on the oracle executable and search the result for a recognizable piece of text that will pick out just the view definition. If you can’t access the executable, though, the trace file may be the only option – and don’t forget that you may be able to read your trace file from v$diag_trace_file_contents through an SQL*Plus connection.

X$ row sizes

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 10:59 am GMT Mar 21,2022

Here’s a script I’ve used from time to time to help me get some ideas about how to hack through some x$ tables. It’s obviously something that can only be run by the SYS schema and it’s not something to apply arbitrarily to every x$ table as the side effects could be nasty but, on a sandbox instance with enough CPUs and a large enough SGA, it might give you some interesting clues about how Oracle is defining or accessing some of its internal structures.

The first query simply walks through the specified x$ table in index (indx) order comparing the address (addr) of the preceding row with that of the current row and reporting the difference. Some of the x$ tables are likely to give you a random collection of large numbers, which means there is no well-structured row involved. Others will give very regular patterns – though not necessarily a constant value as you move from row to row (take a look at x$ksuru, for example).

The second query takes the lowest and highest addresses of rows in the table, finds the difference and divides by one less than the number of rows in the table (i.e. allowing for the fact that the highest address is the start of a final row, not the end of a row). If the average size value matches the value reported by every other row in the output that’s a fairly strong clue that the x$ table may be a simple array of fixed sized rows.

rem
rem     Script:         x_rowsize.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2002 (probably)
rem
rem     Usage:
rem             Connect as SYS (due to x$)
rem             start x_rowsize {x tablename} {number of rows to list}
rem

set linesize 120
set pagesize  60
set trimspool on
set tab off

prompt  ================================================
prompt  Usage:  @x_rowsize {table_name} {number_of_rows}
prompt  ================================================
pause   Press return to continue

define m_xtab  = &1
define m_rowct = &2

column addr_dec         format 999,999,999,999,999
column lag_addr_dec     format 999,999,999,999,999
column row_size         format 999,999,999,999,999

spool x_rowsize

select 
        addr,
        to_number(addr,'XXXXXXXXXXXXXXXX') addr_dec, 
        to_number(lag(addr,1) over (order by indx),'xxxxxxxxxxxxxxxx') lag_addr_dec, 
        to_char(
                to_number(addr,'XXXXXXXXXXXXXXXX') -
                        to_number(lag(addr,1) over (order by indx),'xxxxxxxxxxxxxxxx'),
                        'xxxxxxxxxxxxxxxx'
        ) row_size_hex,
        to_number(addr,'XXXXXXXXXXXXXXXX') -
                to_number(lag(addr,1) over (order by indx),'xxxxxxxxxxxxxxxx') row_size,
        indx 
from    &m_xtab
where   indx <= &m_rowct
order by 
        indx
/

prompt  ========================================================
prompt  Average row size will (almost certainly) not match the
prompt  row gap listed in the above if the x$ table is segmented
prompt  ========================================================

select
        min(addr), max(addr), count(*),
        (to_number(max(addr),'xxxxxxxxxxxxxxxx') -
                to_number(min(addr),'xxxxxxxxxxxxxxxx')
        ) / (count(*) - 1) avg_row_size
from    &m_xtab
/

spool off

It’s a good idea to start with a small sample size in case a complete scan of a structure crashes the session or even the instance- so here’s the result of calling the script with the parameters x$ksuse 10

===============================================
Usage:  @x_rowsize {table_name} {numbe_of_rows}
===============================================
Press return to continue


ADDR                         ADDR_DEC         LAG_ADDR_DEC ROW_SIZE_HEX                  ROW_SIZE       INDX
---------------- -------------------- -------------------- ----------------- -------------------- ----------
000000008E713678        2,389,784,184                                                                      1
000000008E715E60        2,389,794,400        2,389,784,184              27e8               10,216          2
000000008E718648        2,389,804,616        2,389,794,400              27e8               10,216          3
000000008E71AE30        2,389,814,832        2,389,804,616              27e8               10,216          4
000000008E71D618        2,389,825,048        2,389,814,832              27e8               10,216          5
000000008E71FE00        2,389,835,264        2,389,825,048              27e8               10,216          6
000000008E7225E8        2,389,845,480        2,389,835,264              27e8               10,216          7
000000008E724DD0        2,389,855,696        2,389,845,480              27e8               10,216          8
000000008E7275B8        2,389,865,912        2,389,855,696              27e8               10,216          9
000000008E729DA0        2,389,876,128        2,389,865,912              27e8               10,216         10

10 rows selected.

========================================================
Average row size will (almost certainly) not match the
row gap listed in the above if the x$ table is segmented
========================================================

MIN(ADDR)        MAX(ADDR)          COUNT(*) AVG_ROW_SIZE
---------------- ---------------- ---------- ------------
000000008E713678 000000008EDEDD70        704   10222.1451

1 row selected.

At first sight it looks as if the rowsize for an x$ksuse (v$session) rows is 10,216 bytes – but then the average rowsize turns out to be different, so there’s some inconsistency somewhere in x$ksuse that we can’t see in the first few rows.

After running the report for the whole x$ksuse structure I realised that a quick and easy edit would report just the interesting bits. I folded the original query into an inline view and selected only the rows where “row_size != 10216”. Here’s the result:

ADDR                         ADDR_DEC         LAG_ADDR_DEC ROW_SIZE_HEX                  ROW_SIZE       INDX
---------------- -------------------- -------------------- ----------------- -------------------- ----------
000000008E8CAB98        2,391,583,640        2,391,571,984              2d88               11,656        177
000000008EA820B8        2,393,383,096        2,393,371,440              2d88               11,656        353
000000008EC395D8        2,395,182,552        2,395,170,896              2d88               11,656        529

That looks like an interesting pattern – especially if you’re good at mental arithmetic (or have a calculator handy) and I tell you that there are 704 rows in x$ksuse.

The way I’ve written my code the row size reported for indx = n is the length of row n-1, so out of the 704 rows in x$ksuse, the “interesting” rows are 176, 352, and 528 and when you check the arithmetic you realise:

  • 176 = 704 / 4,
  • 352 = 704 / 2,
  • 528 = 704 * 3/4

Maybe it’s not a coincidence that this instance is running on 4 CPUs; moreover, with that thought in mind, if I look at v$latch_children to see if there are any latches with 4 children I find (amongst others) the following latch names:

client/application info
session allocation
session idle bit
session switching

There may be other latches that are also relevant but these 4 latches have names that suggest you might want to acquire one of them if you wanted to modify the contents of a session-related row – in other words if you wanted to modify the contents of a row in x$ksuse. So maybe Oracle has checked the cpu_count on startup and created the v$session/x$ksuse array in equal sized pieces, one for each CPU, so that it can minimise contention when try to modify data in x$ksuse. And following that thought I’ll just add that I had set sessions = 702 in my parameter file, and Oracle had rounded that up to a multiple of 4.

The hypothesis is easy to check – just bounce the database a few times, using a pfile rather than an spfile, change the cpu_count and sessions parameter each time and see if Oracle always rounds sessions up (if needed) to a multiple of cpu_count and then splits the array into cpu_count equal pieces.

And that certainly seemed to be the case on my little machine. (Obviously I’m not in a position to see what would happen on a machine with 256 CPU and sessions set to 16,000 – so there might be some sanity checks for extreme cases.)

Jumping to conclusions

Going back to the output from my initial setup, it looks as if each row in x$ksuse is going to demand 10,216 bytes, and that there may be 1,440 (viz: 11,656 – 10,216) bytes needed as some sort of “header” for each segment of the array, but what sort of header could need 1,440 bytes? I can’t help noticing that if you wanted to store 176 pointers of 8 bytes each that’s a total of 1,408 bytes – so maybe the “header” is mostly a list of pointers to the rows – just like the row directory in a data block!

It’s a thought that might be heading in the right direction, but perhaps it’s a little premature. Fortunately on one of my restarts one of the gaps showed up as roughly minus 30,000,000. So I need to find an explanation that covers this case as well – but that’s okay because I soon came up with a hypothesis (that I haven’t actually confirmed with a reproducible model yet).

I know that my memory granule size is 16MB (select * from V$sgainfo). What if Oracle had allocated the first few sections of x$ksuse and then found that it didn’t have enough room left in the current granule for the next piece of the array? It would have to start using another granule. What if the granules were attached to the shared pool “from the top downwards” – the next granule would have a starting address that was 16MB lower than the current granule – so the step from the end of the current granule to the beginning of the next granule would be around 32MB backwards. Allowing for the fact that I was experimenting with numbers that produced sizes of about 2MB (200 rows @ 10,216 bytes) for each piece of the array the hypothesis is reasonably consistent with the step size of minus 30 million.

Further experimentation is left as an exercise for the interested reader, but if I wanted to test the granule hypothesis I could probably experiment with carefully selected values for sessions, maybe starting the instance with a reduced granule size (parameter _ksmg_granule_size) to see if I could get two or more backwards jumps in the array. But I think I’ve pursued x$ksuse far enough for the moment. Except I have one more (generic) problem that it highlights.

It’s not that simple

If I query x$kqfta – the list of x$ tables, I see the following for x$ksuse:

SQL> execute print_table('select * from x$kqfta where kqftanam = ''X$KSUSE''')
ADDR                           : 00000000143D1DD0
INDX                           : 43
INST_ID                        : 1
CON_ID                         : 0
KQFTAOBJ                       : 4294951004
KQFTAVER                       : 42
KQFTANAM                       : X$KSUSE
KQFTATYP                       : 2
KQFTAFLG                       : 1
KQFTARSZ                       : 8
KQFTACOC                       : 113

-----------------

1 row(s) selected

The column kqftarsz is the “Row SiZe” and 8 is a long way off the 10,216 that I’ve derived from the addresses of consecutive rows in the table. So how does an 8 byte column turn into a 10,216 byte row? (I wonder if it’s describing the rows in the table of pointers to the rows, not the length of the rows themselves). But (apparently) it’s neither 8 nor 10,216 anyway because if I look at x$kqfco (the list of x$ columns) for x$ksuse I see the following):

select
        co.indx,
        co.kqfconam     column_name,
        co.kqfcosiz     column_size,
        co.kqfcooff     column_offset
from
        x$kqfta ta,
        x$kqfco co
where
        co.kqfcotab = ta.indx
and     ta.kqftanam = 'X$KSUSE'
order by
        co.kqfcooff,
        co.indx
/

      INDX COLUMN_NAME                      COLUMN_SIZE COLUMN_OFFSET
---------- -------------------------------- ----------- -------------
       503 ADDR                                       8             0
       504 INDX                                       4             0
       505 INST_ID                                    4             0
       506 CON_ID                                     2             0
       507 KSSPAFLG                                   4             0
       508 KSSPAOWN                                   8             0
...
       603 KSUSELTXIDFLG                              4          9636
       604 KSUSELTXID                                 8          9648
       615 KSUSEFLG3                                  4         10000

According to this the last column of the row starts at byte 10,000 and ends at 10,003 giving us a third possible length for the row and a gap of 212 bytes to explain; and even though there are several x$ tables starting at the same address (x$ksuse, x$ksusio, x$lsusecst, x$ksuru, x$ksusm) on the same internal structure none of them has a column definition explaining the gap.

It looks as if any further investigation (of x$ksuse at least) will have to depend on dumping memory memory to trace files and chasing pointers.

Summary

When you try to find out how big a “row” in an x$ structure is you may find several pieces of contradictory information:

  1. The rowsize (kqftarsz) reported in x$kqfta.
  2. The offset + length of the final column in the description in x$kqfco.
  3. The “gap” between consecutive rows when you query addr in the object itself

Using oradebug to peek a couple of rows may help you uncover the truth. But you may end up chasing pointers to pointers to pointers to get to the ultimate source.

March 18, 2022

v$ v_$ v$

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 5:33 pm GMT Mar 18,2022

One of the “curiosity” questions that comes up from time to time on the various forums is the one about the cascade of names from synonym to view to object that Oracle uses to provide access to the dynamic performance views. Typically we see the pattern:

  • v$XXX is a public synonym for a stored view v_$XXX
  • v_$XXX is defined as ‘select {list of columns} from v$XXX

So, after two steps, we seem to be back where we started. Why has Oracle done this? I came up with the hypothesis a long time ago that there were two parts to the answer; the first is a privilege requirement, the second is for convenience.

SYS is a very special user (as many people have found when running SQL in the sys schema and finding that it doesn’t behave in exactly the same way it does in every other schema) and access to the dynamic performance views is one of the most exotic features of the SYS account. Underneath the respectable veneer of the dynamic performance (v$) views lies the murky horror that is the x$ layer where an “table” may turn out to be a function call, some form of memory array, a linked list, or a hideously complex structure that needs to be filtered, fragmented, reassembled and sorted to yield something that looks like a row containing useful information – an “object” may even be a function that reads an array of pointers to linked lists of hideously complex structures.

So the dynamic performance views hide nasty things and SYS is the only schema allowed to execute the Oracle kernel code to interpret and display those things. But the developers of tools like EM may find it highly desirable to get access the contents of the dynamic performance views – so Oracle creates “real” (i.e. stored) views in the SYS schema to expose the contents of the dynamic performance views and may grant select privileges on those stored views to ordinary users. Of course, since the views are created by SYS the stored queries defining the view contents operate with the privileges of SYS , which means the views can execute the kernel code and return the correct results.

Everyone happy so far?

So now a developer writes a really terrific query that they test very carefully against the dynamic performance view to make sure it’s producing the right results. And when it’s working they have to write a new version of the query using the names of the stored view rather than the names of the dynamic performance views because they also want to include their query in an Enterprise Manager / Grid Control monitor screen.

But why have two versions of a query when, by creating a suitable set of synonyms, a single version of the query will work. Choose your synonym names carefull and when the super query is run by SYS the parser will interpret names as direct references to the dynamic performance views and when it’s run by any other user (who’s had select granted on the stored views) it will translate names of synonyms to names of stored views and onwards to the names of dynamic perfermance views.

So: the stored views make it possible for non-SYS users to run SQL with the privileges of SYS; the cunning use of synonyms means the same SQL text can be run by SYS and non-SYS users and mean the same thing.

Lagniappe

There’s one dynamic performance view that breaks the general pattern, the one about sequences. Try running the following as SYS – the first triplet demonstrates the general pattern – the second doesn’t:

select 'dynamic' object_type, kqfvinam     object_name from x$kqfvi       where kqfvinam     like 'V%SQLAREA'
union all
select 'real view',           view_name                from dba_views     where view_name    like 'V%SQLAREA'
union all
select 'synonym',             synonym_name             from dba_synonyms  where synonym_name like 'V%SQLAREA';

select 'dynamic' object_type, kqfvinam     object_name from x$kqfvi       where kqfvinam     like 'V%SEQUENCES'
union all
select 'real view',           view_name                from dba_views     where view_name    like 'V%SEQUENCES'
union all
select 'synonym',             synonym_name             from dba_synonyms  where synonym_name like 'V%SEQUENCES';

The dynamic performance view v$_sequences doesn’t follow the pattern. There is no associated stored view, hence no possibility of a working synonym. (The dynamic performance view is also unusual in having an underscore after the v$, maybe that’s why some automatic code to generate the view and synonym didn’t work for it ;)

You could argue that v$_lock and v$_lock1 also break the pattern, but they’re just Oracle flailing around frantically but deliberately invisibly under v$lock, which does follow the pattern.

You might also point out that the one pair of dynamic performance views that start with GO$/O$ also breaks the pattern as the stored view and synonym change the ‘O’s to ‘V’s.

March 16, 2022

Excavating x$

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 11:13 pm GMT Mar 16,2022

If you ever feel the need to dig down into the “dynamic performance views (v$)”, and then go a little deeper to get to the “internal structures (x$)” there are 5 objects in the list of x$ that you will eventually want to know about.

  • x$kqfta – headline information about the x$ structures – name, size, column count.
  • x$kqfvi – a list of the names of the dynamic performance views (mostly pairs of v$/gv$)
  • x$kqfvt – a list of the select statements that define the views in x$kqfvi
  • x$kqfdt – a list of “derived tables”: a cross between synonyms and views of a few of the structures in x$kqfta
  • x$kqfco – a list of the “columns” in each of the tables in x$kqfta (but not x$kqfvi and x$kqfdt)

I’ve listed the tables in this order because that’s the way they’re indexed (0 – 4) in x$kqfta, but your journey is likely to start from the view v$fixed_table, or the view v$fixed_view_definition. From a session connected as SYS to a cdb$root in 19.11.0.0:

SQL> describe v$fixed_view_definition
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 VIEW_NAME                                          VARCHAR2(128)
 VIEW_DEFINITION                                    VARCHAR2(4000)
 CON_ID                                             NUMBER

SQL> select view_definition
  2  from v$fixed_view_definition
  3  where view_name = 'V$FIXED_VIEW_DEFINITION'
  4  /

VIEW_DEFINITION
--------------------------------------------------------------------------------
select  VIEW_NAME , VIEW_DEFINITION, CON_ID from GV$FIXED_VIEW_DEFINITION where
inst_id = USERENV('Instance')


1 row selected.

SQL> select view_definition
  2  from v$fixed_view_definition
  3  where view_name = 'GV$FIXED_VIEW_DEFINITION'
  4  /

VIEW_DEFINITION
--------------------------------------------------------------------------------
select i.inst_id,kqfvinam,kqftpsel, i.con_id   from x$kqfvi i, x$kqfvt t where i
.indx = t.indx

There’s a fairly common pattern in the fixed view definitions – they often come in pairs, one starting GV$ the other starting V$, and the V$ is usually just “select most of the columns from the gv$ for this instance”. The GV$ views are the “Global” views relevant to RAC, the V$ views are local to the current instance. (Just pretend you haven’t noticed that Oracle uses the deprecated userenv() function to find the instance id rather than the sys_context(‘userenv’,{variable}) function.)

If you are running RAC and query one of the GV$ views you’ll find that every other instance starts a Parallel Execution process to create it’s local results for the calling instance. This PX process doesn’t come from (and will ignore the limit on) the normal pool of PX processes and will have a name like PZnn rather than the traditional Pnnn.

You’ll note in the result from the last query that gv$fixed_view_definition is a simple join of x$kqfvi and x$kqfvt on the indx column. There is a problem with this view – hidden in plain sight – but I’ll have to come back to that later.

We can go through a similar set of queries to find the definition of v$fixed_table:

SQL> select view_definition
  2  from v$fixed_view_definition
  3  where view_name = 'V$FIXED_TABLE'
  4  /

VIEW_DEFINITION
--------------------------------------------------------------------------------
select  NAME , OBJECT_ID , TYPE , TABLE_NUM, CON_ID from GV$FIXED_TABLE where in
st_id = USERENV('Instance')


1 row selected.

SQL> select view_definition
  2  from v$fixed_view_definition
  3  where view_name = 'GV$FIXED_TABLE'
  4  /

VIEW_DEFINITION
--------------------------------------------------------------------------------
select inst_id,kqftanam, kqftaobj, 'TABLE', indx, con_id from x$kqfta union all
select inst_id,kqfvinam, kqfviobj, 'VIEW', 65537, con_id from x$kqfvi union all
select inst_id,kqfdtnam, kqfdtobj, 'TABLE', 65537, con_id from x$kqfdt


1 row selected.

So the “fixed tables” are really the combination of the basic structures, views on the structures, and the “derived tables”. The objects in the UNION ALL each have a unique object_id in a range close to power(2,32) and (for the tables in x$kqfta) if you make a call to dbms_stats.gather_table_stats() that object_id will be used to store the statistics in the data dictionary.

So what are the “derived tables”. There are two ways to address this question – query it, or find a place where one is used. Here’s a full listing of x$kqfdt (still 19.11.0.0:

SQL> select * from x$kqfdt;

ADDR                   INDX    INST_ID     CON_ID   KQFDTOBJ KQFDTNAM                         KQFDTEQU
---------------- ---------- ---------- ---------- ---------- -------------------------------- ---------------------------
000000001454E6C0          0          1          0 4294952735 X$KSLLTR_CHILDREN                X$KSLLTR
000000001454E6E8          1          1          0 4294952736 X$KSLLTR_PARENT                  X$KSLLTR
000000001454E710          2          1          0 4294956013 X$KSLLTR_OSP                     X$KSLLTR
000000001454E738          3          1          0 4294956014 X$KSLWSC_OSP                     X$KSLWSC
000000001454E760          4          1          0 4294951073 X$KCVFHONL                       X$KCVFH
000000001454E788          5          1          0 4294951074 X$KCVFHMRR                       X$KCVFH
000000001454E7B0          6          1          0 4294951440 X$KCVFHALL                       X$KCVFH
000000001454E7D8          7          1          0 4294951056 X$KGLTABLE                       X$KGLOB
000000001454E800          8          1          0 4294951057 X$KGLBODY                        X$KGLOB
000000001454E828          9          1          0 4294951058 X$KGLTRIGGER                     X$KGLOB
000000001454E850         10          1          0 4294951059 X$KGLINDEX                       X$KGLOB
000000001454E878         11          1          0 4294951060 X$KGLCLUSTER                     X$KGLOB
000000001454E8A0         12          1          0 4294951061 X$KGLCURSOR                      X$KGLOB
000000001454E8C8         13          1          0 4294952684 X$KGLCURSOR_CHILD_SQLID          X$KGLOB
000000001454E8F0         14          1          0 4294952680 X$KGLCURSOR_CHILD_SQLIDPH        X$KGLOB
000000001454E918         15          1          0 4294952683 X$KGLCURSOR_CHILD                X$KGLOB
000000001454E940         16          1          0 4294953372 X$KGLCURSOR_PARENT               X$KGLOB
000000001454E968         17          1          0 4294953759 X$KGLSQLTXL                      X$KGLOB
000000001454E990         18          1          0 4294956135 X$ALL_KQLFXPL                    X$KQLFXPL
000000001454E9B8         19          1          0 4294953124 X$KKSSQLSTAT_PLAN_HASH           X$KKSSQLSTAT
000000001454E9E0         20          1          0 4294953231 X$ZASAXTD1                       X$ZASAXTAB
000000001454EA08         21          1          0 4294953232 X$ZASAXTD2                       X$ZASAXTAB
000000001454EA30         22          1          0 4294953233 X$ZASAXTD3                       X$ZASAXTAB
000000001454EA58         23          1          0 4294951597 X$JOXFS                          X$JOXFT
000000001454EA80         24          1          0 4294951598 X$JOXFC                          X$JOXFT
000000001454EAA8         25          1          0 4294951599 X$JOXFR                          X$JOXFT
000000001454EAD0         26          1          0 4294951621 X$JOXFD                          X$JOXFT
000000001454EAF8         27          1          0 4294952364 X$JOXOBJ                         X$JOXFT
000000001454EB20         28          1          0 4294952365 X$JOXSCD                         X$JOXFT
000000001454EB48         29          1          0 4294952366 X$JOXRSV                         X$JOXFT
000000001454EB70         30          1          0 4294952367 X$JOXREF                         X$JOXFT
000000001454EB98         31          1          0 4294952368 X$JOXDRC                         X$JOXFT
000000001454EBC0         32          1          0 4294952369 X$JOXDRR                         X$JOXFT
000000001454EBE8         33          1          0 4294952370 X$JOXMOB                         X$JOXFM
000000001454EC10         34          1          0 4294952371 X$JOXMIF                         X$JOXFM
000000001454EC38         35          1          0 4294952372 X$JOXMIC                         X$JOXFM
000000001454EC60         36          1          0 4294952373 X$JOXMFD                         X$JOXFM
000000001454EC88         37          1          0 4294952374 X$JOXMMD                         X$JOXFM
000000001454ECB0         38          1          0 4294952375 X$JOXMAG                         X$JOXFM
000000001454ECD8         39          1          0 4294952376 X$JOXMEX                         X$JOXFM
000000001454ED00         40          1          0 4294956138 X$ALL_ASH                        X$ASH
000000001454ED28         41          1          0 4294956123 X$ALL_KESWXMON                   X$KESWXMON
000000001454ED50         42          1          0 4294956126 X$ALL_KESWXMON_PLAN              X$KESWXMON_PLAN

43 rows selected.

What it comes down to is that there are several different ways of addressing some of the base objects. x$kglob is the nicest example of this with 11 “equivalences”. I don’t know why Oracle has taken this approach – but it may reflect the fact that some of the underlying structures don’t look anything like “rows” and different derived tables expose different components of the same underlying in mutually incomaptible ways. Here’s an example showing one of these derived tables making an appearance:

SQL> explain plan for
  2  select  child_number, plan_hash_value, sql_text 
  3  from V$sql where sql_id = '7m239n32gftgh';

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------
Plan hash value: 1346707802

-----------------------------------------------------------------------------------------------------
| Id  | Operation               | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                           |     1 |   549 |     0   (0)| 00:00:01 |
|*  1 |  FIXED TABLE FIXED INDEX| X$KGLCURSOR_CHILD (ind:2) |     1 |   549 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("KGLOBT03"='7m239n32gftgh' AND "INST_ID"=USERENV('INSTANCE'))

13 rows selected.

When you try to find x$kglcursor_child in the fixed tables (x$kqfta) it’s not there – but a check against x$kqfdt tells us that it’s just another way of addressing x$kglob.

This plan also shows an interesting feature in the table access Operation, which is reported as “Fixed table fixed index” with “ind:2” appended when we check the Name column of the plan.

In this context “index” has a non-standard meaning – x$kglob is a memory structure helping to describe the contents of the Library Cache, so we shouldn’t expect to see anything that looks like a traditional index. In this context this just means that Oracle has a particularly efficient access path if your predicate is of the right type. We could guess in this case that it’s the predicate “kglobt03 =’7m239n32gftgh'” that Oracle is using, but it would be nice to check, and nice to know what other “indexed” access paths there are into the structure (this is, after all, “ind:2”, so there ought to be an “ind:1”). Conveniently Oracle has a dynamic performance view called v$index_fixed_columns that we can query:

SQL> select * from v$indexed_fixed_column where table_name = 'X$KGLOB';

TABLE_NAME                INDEX_NUMBER COLUMN_NAME          COLUMN_POSITION     CON_ID
------------------------- ------------ -------------------- --------------- ----------
X$KGLOB                              1 KGLNAHSH                           0          0
X$KGLOB                              2 KGLOBT03                           0          0

2 rows selected.

Note that the Index_Number column tells use that x$kglobt03 is defined as the 2nd index on the table – which is consistent with the “ind:2” in the plan. Of course, it’s not immediately obvious, and can take a bit of effort to work out, that kgnahsh corresponds to v$sql.hash_value and kglobt03 corresponds to v$sql.sql_id; but it’s nice to know that we can find all the efficent access paths if we need to tune a bit of “expert” SQL that we’ve discovered on the internet.

Even more exciting – we can go back to v$fixed_view_definition and find out what structure(s) are sitting underneath v$indexed_fixed_column:

SQL> select view_definition
  2  from v$fixed_view_definition
  3  where view_name = 'GV$INDEXED_FIXED_COLUMN'
  4  /

VIEW_DEFINITION
--------------------------------------------------------------------------------
select c.inst_id,kqftanam, kqfcoidx, kqfconam, kqfcoipo, c.con_id  from x$kqfco
c, x$kqfta t where t.indx = c.kqfcotab and kqfcoidx != 0

It’s a join between x$kqfta and x$kqfco – the fifth item on our list: the complete list of x$ columns. And you just can’t help wondering what results you would get if you removed the filter on column kqfcoidx. So here’s a little script I wrote a long time ago to do just that:

rem
rem     Script:         x_desc.sql
rem     Author:         Jonathan Lewis
rem     Dated:          September 2002
rem     Purpose:        Describe the X$ objects
rem
rem     Notes:
rem     This script can only be run by SYS
rem

set newpage 0
set pagesize 60
set linesize 132
set trimspoon on
set tab off

break on kqftanam skip page
column kqftanam new_value m_table noprint

column  kqfconam        heading "Column" format a34
column  kqfcodty        heading "Ext Type" 
column  kqfcotyp        heading "Int Type"
column  kqfcomax        heading "Array Max"
column  kqfcolsz        heading "Len Size"
column  kqfcolof        heading "Len Offset"
column  kqfcosiz        heading "Col Size"
column  kqfcooff        heading "Offset"
column  kqfcoidx        heading "Index"
column  kqfcoipo        heading "Idx Col"

ttitle -
        skip 1 -
        center m_table -
        skip 2

spool x_desc

select
        ta.kqftanam,
        co.kqfconam,
        co.kqfcodty,
        co.kqfcotyp,
        co.kqfcomax,
        co.kqfcolsz,
        co.kqfcolof,
        co.kqfcosiz,
        co.kqfcooff,
        co.kqfcoidx,
        co.kqfcoipo
from
        x$kqfta ta,
        x$kqfco co
where
        co.kqfcotab = ta.indx
order by
        ta.kqftanam,
        co.kqfcooff,
        co.indx
;

spool off

And here’s the result for just one “table” in 19.11.0.0 – x$ksled (corresponding to v$event_name) together with a query to report its column count and row size from x$kqfta.

Column                       Ext Type   Int Type  Array Max   Len Size Len Offset   Col Size     Offset      Index    Idx Col
-------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
ADDR                               23          9          0          0          0          8          0          1          0
INDX                                2         11          0          0          0          4          0          2          0
INST_ID                             2         11          0          0          0          4          0          0          0
CON_ID                              2         11          0          0          0          2          0          0          0
KSLEDNAM                            1          6          0          0          0         64          0          0          0
KSLEDCLASSID                        2          7          0          0          0          4          0          0          0
KSLEDCLASS                          1          7          0          0          0         64          0          0          0
KSLEDP1                             1          6          0          0          0         64          8          0          0
KSLEDP2                             1          6          0          0          0         64         16          0          0
KSLEDP3                             1          6          0          0          0         64         24          0          0
KSLEDFLG                            2         11          0          0          0          4         32          0          0
KSLEDCLASS#                         2          0          0          0          0          2         36          0          0
KSLEDDSP                            1          6          0          0          0         64         40          0          0
KSLEDHASH                           2         11          0          0          0          4         52          0          0

SQL> select kqftarsz, kqftacoc from X$kqfta where kqftanam = 'X$KSLED';

  KQFTARSZ   KQFTACOC
---------- ----------
        56         14

In this particular case you can see that the column count (14) matches – I haven’t checked if this is always true, but I suspect it is – and in this case we can see from a quick check of the highest offset (52) plus its column size (4) that the row size (56) matches as well, but that’s not always the case as we shall see in other examples.

There are, inevitably, some puzzles in this output.

  • How can we have 7 columns all starting at the same zero offset?
  • How can you have a column with an offset of 16 (ksledp2) when the previous column has an offset of 8 and a length of 64?
  • Might there be an interesting 2 bytes of information between ksledclass# (offset 36 length 2) and ksleddsp (offset 40)?

I won’t attempt to unravel the mystery any further at this point, but I will just point out that there are two “indexes” on this “table”. To access a row efficiently you can either query by addr (its 64-bit address) or by indx (which suggests it’s stored in memory as a simple non-segmented, array – which is another detail I’ll explain at a later date).

Coming soon

  • The problem with v$fixed_view_definition – and how to circumvent it: now published
  • Three ways of checking row sizes – which may give different results: now published
  • Notes on interpreting the columns in x$kqfco.

I will leave you with one last thought – the X$ “tables” fall into 4 different categories and you can identify which category that is by checking the address (addr) of the first row of each table (provided the table has some data in it, of course). And here’s a silly little detail – I think every x$ table starts at indx = 1, except x$ksupr which starts at indx = 0.

March 11, 2022

Session count

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 12:38 pm GMT Mar 11,2022

This note is an extended answer to a question the Kyle Hailey raised on the oracle-l list server a couple of days ago. Rather than repeat the question immediately, I’ll start with a little scripts (to be run by a suitably privileged account – which eventually will have to be SYS) and then telll you the question.

rem
rem     Script:         xksuse.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2022
rem     Purpose:        
rem
rem     Last tested 
rem             21.3.0.0
rem             19.11.0.0
rem
rem     Notes:
rem     Has to be run by sys in the cdb$root
rem
rem     See also:
rem     https://www.freelists.org/post/oracle-l/sysmetric-Session-Count-Count-from-vsession
rem

select value from v$sysmetric where metric_name = 'Session Count';

select current_utilization from v$resource_limit where resource_name = 'sessions';

select count(*) from v$session;

Here’s a set of results taken from a test instance of 19.11.0.0

     VALUE
----------
	78

CURRENT_UTILIZATION
-------------------
		 78

  COUNT(*)
----------
	54

The question on oracle-l was: “why doesn’t v$sysmetric agree with the count of v$session?”

This raises an internesting question about licencing. I don’t really pay much attention to licensing since I don’t run a production Oracle database any more, but I have a vague memory that for some versions there were things like “named user” licenses and “concurrent sessions” licenses – and there are a couple of parameters you can set to avoid the risk of breaching the license agreement:

Parameter                       Description   
-------------------------------------------------------------------------------------------------
license_max_sessions            maximum number of non-system user sessions allowed
license_max_users               maximum number of named users that can be created in the database

Do these license agreements still exist? Has anyone signed up for them?

I haven’t set either of these parameters in my database so they’ve defaulted to show zero, but by running up several extra sessions and checking the v$sysmetic entry “Session Limit %” I could infer that the default value for license_max_sessions was 1,000 (for my instance on my machine, possibly affected by other parameter settings), and that the limit percentage was based on the v$sysmetric value not a count from v$session. (And if it had been based on the v$session count that would have been almost as bad since the description of the parameter is “non-system user sessions” – and of the 54 sessions reported above only 2 of them were of type “USER”.)

The Geek Bit

Before Kyle mentioned the license issue I was only interested in finding out why there was a difference in the counts. The starting point for this was to look at the x$ structure underlying v$session – and the critical one is x$ksuse although for some time the view includes a join to two other x$ structures: x$kslwt (v$session_wait) and x$ksled (v$event_name).

If you want to see the definition of v$session (or, rather, gv$session) you will run into a problem if you query v$fixed_view_definition because the definition of v$session is longer than the varchar2(4000) limit.

A workaround to this problem is to extract the full definition from the oracle executable which, for Unix systems you can do with a command like:

strings -a oracle | grep -inT "x\$ksuse" | fold >temp.txt

You’ll have to search the (fairly small) text file to find the right definition, of course as there are several dynamic performance views that include x$ksuse or x$ things with names that start the same way. When you”ve identified the correct definition you’ll note that there are two predicates applied to the structure to generate v$session (and these have not changed since at least 8.1.7.4 – which is the oldest listing of the views I have stored):

where
        bitand(s.ksspaflg,1)!=0 
and     bitand(s.ksuseflg,1)!=0

Based on these predicates I wrote a simple script to report sessions (x$ entries) that matched each of the predicates separately:

select 'U' flag, count(*) from x$ksuse where bitand(ksuseflg,1)!=0
union all
select 'S' flag, count(*) from x$ksuse where bitand(ksspaflg,1)!=0
/

F   COUNT(*)
- ----------
U         61
S         78

The 78 S types matches (maybe coincidentally) the v$resource_limit and v$sysmetric figures, but there’s an interesting excess built into the U types when compared to the count of v$session. So the next step is to drill into the detail a bit.

To drill down I wrote a query to list the program column (ksusepnm) with a count of how many rows there were for that program name in the union all. For many program names we could expect to see the answer 2, one row of type U and one of type S, but there are likely to be many end user sessions showing showing some format of connection mechanism such as ((TNS V1-V3). Here’s the script with a predicate showing only the “interesting” rows (i.e. the ones where the result is not 2):

set null n/a

column ksusepnm format a40

select ksusepnm, count(*) from (
select 'U' flag, ksusepnm from x$ksuse where bitand(ksuseflg,1)!=0
union all
select 'S' flag, ksusepnm from x$ksuse where bitand(ksspaflg,1)!=0
)
group by ksusepnm
having count(*) != 2
order by count(*)
/

KSUSEPNM                                   COUNT(*)
---------------------------------------- ----------
oracle@linux183.localdomain (P001)                1
oracle@linux183.localdomain (P002)                1
oracle@linux183.localdomain (J00B)                1
oracle@linux183.localdomain (P000)                1
oracle@linux183.localdomain (MZ00)                1
oracle@linux183.localdomain (MZ01)                1
oracle@linux183.localdomain (P003)                1
oracle@linux183.localdomain (SCMN)                4
sqlplus@linux183.localdomain (TNS V1-V3)          4
n/a                                              24

10 rows selected.

We can ignore the sqlplus rows, they correspond to the two sessions (reported twice each) that are visible in v$session. Similarly we can ignore the SCMN rows, of which there are two in my v$session at present, but we have 24 rows with no program, and 7 rows that appear only in one of the two classes.

If we take the 7 rows where the count is only 1 we note that they are all named for “slave” processes: Jnnn are job queue slaves, Pnnn are parallel execution slaves, and MZnn are manageability monitor (MMON) slaves which, the operating system tells me, don’t currently exist.

The interesting thing is the set of 24 rows that have no associated program; and to look at those I’m just going list the whole dataset rather than counting it and (since the volume is small in my case) examine the results by eye.

select 'U' flag, ksusepnm from x$ksuse where bitand(ksuseflg,1)!=0
union all
select 'S' flag, ksusepnm from x$ksuse where bitand(ksspaflg,1)!=0
order by 1,2
/

I won’t reproduce the results but the key detail was that all the rows where the program name was null were from the S class (and that’s not too surprising, really).

The next (and final, so far) check was to add a few columns to the simple query and change the sort order to see if anything stood out:

break on flag 
compute count of ksspatyp on flag

select 
        'U' flag, indx, ksuseser, ksspatyp, ksusepnm, bitand(ksspaflg,1) spaflg, bitand(ksuseflg,1) useflag, con_id, ksusepro 
from    x$ksuse 
where   bitand(ksuseflg,1)!=0
union all
select 
        'S' flag, indx, ksuseser, ksspatyp, ksusepnm, bitand(ksspaflg,1) spaflg, bitand(ksuseflg,1) useflag, con_id, ksusepro 
from    x$ksuse 
where   bitand(ksspaflg,1)!=0
order by 
        flag, ksusepro, ksusepnm
/

The break/compute commands allow me to check that the totals for the S class and U class still matched with the original 78 and 61 – dynamic structures can change very quickly, and some of the oddities were about background processes. The interesting rows in the U class were as follows:

F       INDX   KSUSESER   KSSPATYP KSUSEPNM                                     SPAFLG    USEFLAG     CON_ID KSUSEPRO
- ---------- ---------- ---------- ---------------------------------------- ---------- ---------- ---------- ----------------
U        533          0          0 oracle@linux183.localdomain (J00B)                0          1          1 00
         546          0          0 oracle@linux183.localdomain (MZ00)                0          1          0 00
         539          0          0 oracle@linux183.localdomain (MZ01)                0          1          0 00
         537          0          0 oracle@linux183.localdomain (P000)                0          1          0 00
         525          0          0 oracle@linux183.localdomain (P001)                0          1          0 00
         544          0          0 oracle@linux183.localdomain (P002)                0          1          0 00
          49          0          0 oracle@linux183.localdomain (P003)                0          1          0 00

Basically this looks like a set of processes which used to be “user” processes, which have terminated and shutdown, but their entries in x$ksuse have not been completely “cleaned” out. The numbers are nice because we have 61 rows in this class, 7 rows show “no serial#, no process” and v$session is reporting 54 sessions which is the result of 61 minus 7.

So what about the S class rows. I’ve cunningly sorted by ksusepro which, if you hadn’t guessed, is the paddr for the process supporting the session. So do we have any cases where multiple sessions seem to be supported by a single process – and this is where I viewed the output and noticed I could show the interesting detail with a simple call to grep plus a little cosmetic editing:

grep -nT -B+1 "                                " xksuse.lst >temp.txt

                 INDX   KSUSESER   KSSPATYP KSUSEPNM                                     SPAFLG    USEFLAG     CON_ID KSUSEPRO
                 ----   --------   -------- -------------------------------------------- ------    -------     ------ ----------------
 270    -           3      62786          1 oracle@linux183.localdomain (GEN0)                1          1          0 000000008E567480
 271    :          26       4581          1                                                   1          0          0 000000008E567480
 274    -         504       2420          1 oracle@linux183.localdomain (GEN1)                1          1          0 000000008E56B4D0
 275    :         519      53839          1                                                   1          0          0 000000008E56B4D0
 278    -           6      44715          1 oracle@linux183.localdomain (OFSD)                1          1          0 000000008E56F520
 279    :          27      60151          1                                                   1          0          0 000000008E56F520
 281    -           4      65197          1 oracle@linux183.localdomain (DBRM)                1          1          0 000000008E572000
 282    :          25      41180          1                                                   1          0          0 000000008E572000
 294    -          13      25844          1 oracle@linux183.localdomain (W000)                1          1          0 000000008E582140
 295    :          14      42325          1                                                   1          0          0 000000008E582140
 296    -         513      33860          1 oracle@linux183.localdomain (LREG)                1          1          0 000000008E5836B0
 297    :         520      56123          1                                                   1          0          0 000000008E5836B0
 298    -          28       5509          1 oracle@linux183.localdomain (W005)                1          1          0 000000008E584C20
 299    :          29      49967          1                                                   1          0          0 000000008E584C20
 301    -         515      19909          1 oracle@linux183.localdomain (MMON)                1          1          0 000000008E588C70
 302    :         516       3981          1                                                   1          0          0 000000008E588C70
 303    -          17       5100          1 oracle@linux183.localdomain (MMNL)                1          1          0 000000008E58A1E0
 304    :          32      46728          1                                                   1          0          0 000000008E58A1E0
 306    -          37      30525          1 oracle@linux183.localdomain (M000)                1          1          0 000000008E58F7A0
 307    :          31      18290          1                                                   1          0          0 000000008E58F7A0
 308    -         523       2817          1 oracle@linux183.localdomain (M002)                1          1          0 000000008E590D10
 309    :         524      58038          1                                                   1          0          0 000000008E590D10
 310    -          19      30164          1 oracle@linux183.localdomain (M001)                1          1          0 000000008E592280
 311    :          35      25427          1                                                   1          0          0 000000008E592280
 315    -         518      20043          1 oracle@linux183.localdomain (AQPC)                1          1          0 000000008E598DB0
 316    :         530      24619          1                                                   1          0          0 000000008E598DB0
 317    -          15      27947          1 oracle@linux183.localdomain (M004)                1          1          0 000000008E59A320
 318    :          45      37304          1                                                   1          0          0 000000008E59A320
 319    -         521      58123          1 oracle@linux183.localdomain (CJQ0)                1          1          0 000000008E59E370
 320    :         522      32179          1                                                   1          0          0 000000008E59E370
 321    -          23      17837          1 oracle@linux183.localdomain (W003)                1          1          0 000000008E59F8E0
 322    :          30      24152          1                                                   1          0          0 000000008E59F8E0
 326    -         529      54638          1 oracle@linux183.localdomain (M003)                1          1          0 000000008E5A0E50
 327    :         526       9248          1                                                   1          0          0 000000008E5A0E50
 328    -          16      23626          1 oracle@linux183.localdomain (W001)                1          1          0 000000008E5A23C0
 329    :          20      52434          1                                                   1          0          0 000000008E5A23C0
 332    -         531      31198          1 oracle@linux183.localdomain (QM02)                1          1          0 000000008E5A6410
 333    :         532      53916          1                                                   1          0          0 000000008E5A6410
 334    -          38      28119          1 oracle@linux183.localdomain (W006)                1          1          0 000000008E5A7980
 335    :          39      55963          1                                                   1          0          0 000000008E5A7980
 336    -          42      46149          1 oracle@linux183.localdomain (W007)                1          1          0 000000008E5AA460
 337    :          43      45287          1                                                   1          0          0 000000008E5AA460
 341    -          47      36878          1 oracle@linux183.localdomain (W002)                1          1          0 000000008E5BD080
 342    :          48      26536          1                                                   1          0          0 000000008E5BD080
 343    -         538      50713          1 oracle@linux183.localdomain (Q002)                1          1          0 000000008E5BE5F0
 344    :         517      40400          1                                                   1          0          0 000000008E5BE5F0
 345    -          34      27163          1 oracle@linux183.localdomain (W004)                1          1          0 000000008E5C5120
 346    :          44      19000          1                                                   1          0          0 000000008E5C5120

Every single one of the S class with a blank program name is “sharing” a process with one of the background processes. In many of the pairs the row with a named program is a slave process (Mnnn, Wnnn etc.) but that’s far from a uniform pattern. I could imagine an argument that says Oracle wants to “pre-allocate” a spare session linked to an existing process in some special cases so that a program can switch to the alternate session (memory structures) if the first session gets corrupted in some way – but I can’t really see that as relevant to any slave processes, while I can think of a couple more programs where this strategy would be more important.

Adding a little weight to the “shadow” idea (or maybe just highlighting a bug) it’s interesting to note that a number of the pairs show consecutive values for the indx (SID) – as if Oracle has created two sessions at the same time.

All I can say at the moment, though, is that we have a consistent pattern for all the excess “invisible” sessions, and the strange entries make up the difference between the v$session count and the v$sysmetric report.

Summing up:

v$session shows a count of 54 but digging into x$ksuse for “user sessions” I can see a further 7 sessions; but the pattern for these sessions is simply “memory structure doesn’t need to be cleaned out after process terminated” so they have nothing to do with any counting anomaly.

v$resource_limit and v$sysmetric report a value that is larger than the v$session count by 24; and we can find 24 rows in x$ksuse which (a) “mirror” 24 background sessions and (b) are linked to processes (v$process.addr) but don’t have a program name (v$session.program) and aren’t flagged as user processes. So it looks as if we can (partially) explain why the two views are over-reporting.

There are some indications that if you are licensing by sessions (if that’s still possible) then the number used for checking the limit is not consistent with the description of the relevant parameter. A large number of background sessions is included in the v$sysmetric figure that seems to be the reference record.

Footnote:

I have a little oddity to pursue in 21.3.0.0, where my v$sysmetric.value was one less than the v$resource_limit current_utilization. The same “shadow” process strategy was visible though, and v$session count plus the “shadow” count agreed with v$resource_limit.

March 7, 2022

SYSAUX Occupants

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 11:32 am GMT Mar 7,2022

This is a draft note that’s been hanging around a very long time for a final edit and print. I was prompted to rediscover it today when I noticed a question on the Oracle Developers’ forum asking about reducing the size of a SYSAUX tablespace that currently included 400GB of free space.

Obviously the usual problem of used extents at the top end of a file can apply even for SYSAUX, and it’s highly likely that it will be necessary to move a few objects around (possibly within the tablespace, possibly out of the tablespace and back) but it’s easy to forget that some of the objects in SYSAUX are sufficiently important that they should only be moved using a procedure that has been supplied by Oracle Corp.

The list of contents of SYSAUX and, where they exist, the relevant procedure to move them are listed in view v$sysaux_occupants, so it’s always worth a quick check of the view before you do anything else – hence this little script:

rem
rem     Script:         sysaux_list.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2012
rem 


set linesize 156
set pagesize 60
set trimspool on
set tab off

spool sysaux_list.lst

-- execute print_table('select * from v$sysaux_occupants order by space_usage_kbytes')

column  schema_name             format a32
column  occupant_name           format a32
column  move_procedure          format a45
column  space_usage_kbytes      format 999,999,999

select 
        schema_name, occupant_name, move_procedure, space_usage_kbytes 
from 
        v$sysaux_occupants
order by 
        space_usage_kbytes
;

spool off

I don’t think there’s anything in the script that needs comment, apart from the optional line I’ve put in that uses Tom Kyte’s print_table() procedure to output rows in a tabular layout that produces a convenient full dump of the view. It’s a standard bit of defensive programming that helps me (sometimes) to avoid missing changes to view contents as versions are upgrade.

Here’s a small extract from the two sections of the output from a little VM sandbox of 19.11:

OCCUPANT_NAME                  : LOGMNR
OCCUPANT_DESC                  : LogMiner
SCHEMA_NAME                    : SYSTEM
MOVE_PROCEDURE                 : SYS.DBMS_LOGMNR_D.SET_TABLESPACE
MOVE_PROCEDURE_DESC            : Move Procedure for LogMiner
SPACE_USAGE_KBYTES             : 0
CON_ID                         : 3

-----------------

...

OCCUPANT_NAME                  : SM/OTHER
OCCUPANT_DESC                  : Server Manageability - Other Components
SCHEMA_NAME                    : SYS
MOVE_PROCEDURE                 :
MOVE_PROCEDURE_DESC            : *** MOVE PROCEDURE NOT APPLICABLE ***
SPACE_USAGE_KBYTES             : 72896
CON_ID                         : 3

-----------------

OCCUPANT_NAME                  : SDO
OCCUPANT_DESC                  : Oracle Spatial
SCHEMA_NAME                    : MDSYS
MOVE_PROCEDURE                 : MDSYS.MOVE_SDO
MOVE_PROCEDURE_DESC            : Move Procedure for Oracle Spatial
SPACE_USAGE_KBYTES             : 199552
CON_ID                         : 3

-----------------

32 row(s) selected



SCHEMA_NAME                      OCCUPANT_NAME                    MOVE_PROCEDURE                                SPACE_USAGE_KBYTES
-------------------------------- -------------------------------- --------------------------------------------- ------------------
SYSTEM                           LOGMNR                           SYS.DBMS_LOGMNR_D.SET_TABLESPACE                               0
...
AUDSYS                           AUDSYS                           DBMS_AUDIT_MGMT.SET_AUDIT_TRAIL_LOCATION                  39,488
SYS                              AO                               DBMS_AW.MOVE_AWMETA                                       45,696
XDB                              XDB                              XDB.DBMS_XDB_ADMIN.MOVEXDB_TABLESPACE                     62,144
SYS                              SM/OPTSTAT                                                                                 68,288
SYS                              SM/OTHER                                                                                   72,896
MDSYS                            SDO                              MDSYS.MOVE_SDO                                           199,552

32 rows selected.

Addendum

It’s a long time since I had to help someone clean load of garbage from sysaux – typically from excess space usage by the audit table, the histogram history table, or the partition synopsis table – so I thought I’d take a quick look around the internet and MOS for any notes that might be good starting points for doing the job in a recent version of Oracle. The biggest unknown for me was the need to mess around inside a PDB, so this addendum is largely a pointer to get me started if I do need to do something with a PDB in the future.

Searching MOS for “v$sysaux_occupants” I noted a number of documents could offer helpful comments, in particular I picked on “Doc ID 1499542.1: Reducing the Space Usage of the SQL Management Base in the SYSAUX Tablespace.” which made reference to the dbms_pdb package and the need to use dbms_pdb.exec_as_oracle_script() to move objects in a PDB’s sysaux tablespace, and the slightly more generic Doc ID 2798505.1 How To Move Sys Table in PDB that made a similar point and highlighted Oracle error ORA-65040: operation not allowed from within a pluggable database.

Following this detail with a further search on google I then found a note on Marco Mischke’s blog with the title: “Shrink SYSAUX of a Pluggable Database”. The note is 6 years old, so shouldn’t be followed thoughtlessly – things change with time – but it’s a good starting point.

February 18, 2022

Index ITL Limit

Filed under: deadlocks,Indexing,Infrastructure,Locks,Oracle,trace files — Jonathan Lewis @ 5:02 pm GMT Feb 18,2022

Here’s a little script that I wrote more than 10 years ago when investigating some undesirable behaviour with indexes. I’ve just rediscovered it after seeing it mentioned in a comment to an old article that I had been prompted to revisit. This isn’t going to help you solve any specific problem, but it might give you some feel for how much work Oracle has to do to cater for efficient index maintenance.

The script is just a dirty little hack to crash a session by calling a procedure recursively until something breaks – in this case when all the ITL slots of an index leaf block are full and the block doesn’t split for the next insert (which was a little surprising in its own right).

rem
rem     Script:         itl_limit.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2010
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem             11.1.0.6
rem  

create table itl_limit(n1 number) pctfree 0;
create unique index il_01 on itl_limit(n1) pctfree 0;

create or replace procedure recursive_itl(i_tx_count number)
as
        pragma autonomous_transaction;
begin
        if i_tx_count != 0 then
                insert into itl_limit values(200 - i_tx_count);
                recursive_itl(i_tx_count - 1);
                commit;
        end if;
end;
/

alter session set events '10046 trace name context forever, level 8';

execute recursive_itl(200);

alter system checkpoint;

alter session set events '10046 trace name context off';

prompt  ==========================================
prompt  If there is no index on the table then you
prompt  should see 169 rows in one block and 31 in
prompt  the other. But if there is an index there
prompt  should be no rows thanks to the rollback
prompt  caused by the error.
prompt  ==========================================

select 
        dbms_rowid.rowid_block_number(rowid), count(*) 
from 
        itl_limit
group by 
        dbms_rowid.rowid_block_number(rowid)
;

prompt  =================================
prompt  Try for a tree dump on the index
prompt  after which you can dump the root
prompt  block to see the ITL entries
prompt  =================================

column object_id new_value m_object_id

select  object_id, object_type, object_name
from    user_objects
where   object_name = 'IL_01'
/

alter session set events 'immediate trace name treedump level &m_object_id ';

If you comment out the creation of the index il_01 then the script completes very quickly (complaining, of course, about the attempt to do a treedump with a null level (ORA-49100: Failed to process event statement). Assuming you’re using an 8KB block size the rowid count query will show that you’ve got 169 rows in one block and 31 rows in the other – and if you dump the block with 169 rows you will find that (a) there’s loads of empty space in the block, and (b) the number of ITL entries has reached 169, and that’s the limiting factor that restricted the number of rows we could insert.

If you create the index then you’ll have to wait roughly 142 seconds for the procedure call to fail (with an ORA-00060: deadlock detected error) and the script to complete. And when you generate the tkprof output from the trace file you’ll find that most of the time is spent in the following statement:

INSERT INTO ITL_LIMIT
VALUES
(200 - :B1 )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    169      0.96     143.69         11      56324       4939         168
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      170      0.96     143.69         11      56324       4939         168

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 138     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  LOAD TABLE CONVENTIONAL  ITL_LIMIT (cr=5 pr=0 pw=0 time=192 us starts=1)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                        11        0.00          0.00
  enq: TX - allocate ITL entry                   48        5.01        142.49
********************************************************************************

We’ve execute 169 times, but reported only 168 rows, which suggests something went wrong on one insert. The 48 waits for “enq:TX – allocate ITL entry” invite further investigation, of course – so let’s get back to the raw trace file and find them.

grep -T -n  "WAIT.*alloc" or19_ora_15879.trc

 571   :WAIT #139861052053576: nam='enq: TX - allocate ITL entry' ela= 1000682 name|mode=1415053316 usn<<16 | slot=10420232 sequence=2 obj#=94939 tim=5285776687
 574   :WAIT #139861052053576: nam='enq: TX - allocate ITL entry' ela= 1016171 name|mode=1415053316 usn<<16 | slot=1310726 sequence=6209 obj#=94939 tim=5286793280
 577   :WAIT #139861052053576: nam='enq: TX - allocate ITL entry' ela= 1001580 name|mode=1415053316 usn<<16 | slot=10223624 sequence=2 obj#=94939 tim=5287795235
...
 709   :WAIT #139861052053576: nam='enq: TX - allocate ITL entry' ela= 4999483 name|mode=1415053316 usn<<16 | slot=11468804 sequence=2 obj#=94939 tim=5423905081
42381  :WAIT #139861052053576: nam='enq: TX - allocate ITL entry' ela= 2999710 name|mode=1415053316 usn<<16 | slot=9633800 sequence=2 obj#=94939 tim=5426905028

All 48 waits occur after the 169th attempt to insert a row. Oracle rotates through 12 ITL slots waiting one second on each, then goes round the loop again waiting 2 seconds on each, then 4 seconds, then 5 seconds – except it doesn’t wait on the 12th ITL on the final loop, instead it reports “DEADLOCK DETECTED (ORA-00060)” in the trace file and dumps a Deadlock graph of the form:

Deadlock graph:
                                          ------------Blocker(s)-----------  ------------Waiter(s)------------
Resource Name                             process session holds waits serial  process session holds waits serial
TX-00930008-00000002-0039DED3-00000000         44      49     X        18979      44      49           S  18979

Note the self-deadlock – the holder and waiter are the same session. After the deadlock graph we get the usual stack dump and after 48,000 lines of trace we see the message “Attempting to break deadlock by signaling ORA-00060” after which the session waits on the 12th ITL for 3 seconds and then the whole stack of autonomous transactions rolls back:

WAIT #139861052053576: nam='enq: TX - allocate ITL entry' ela= 2999710 name|mode=1415053316 usn<<16 | slot=9633800 sequence=2 obj#=94939 tim=5426905028
EXEC #139861052053576:c=429678,e=142567211,p=0,cr=504,cu=11,mis=0,r=0,dep=1,og=1,plh=0,tim=5427342894
ERROR #139861052053576:err=60 tim=5427342914
CLOSE #139861052053576:c=1,e=1,dep=1,type=3,tim=5427343141
XCTEND rlbk=1, rd_only=0, tim=5427343240
XCTEND rlbk=1, rd_only=0, tim=5427343336
...

My test script reports the object_id (not the data_object_id) of the index and does a treedump of it (which should show just a root block (which is also a leaf block) with no entries. The root block address lets you do a treedump, which will show something like:

----- begin tree dump
leaf: 0x400008b 67109003 (0: row:0.0 avs:3988)
----- end tree dump

Notice how the available space (avs) in this root/leaf block is only 3988 bytes rather than roughly 8,000 for an 8KB block size. That’s because the ITL area has taken advantage of its ability to grow to half the size of the block, and once it has grown it doesn’t shrink (Note: there is a hard limit of 255 which will only become apparent with larger block sizes – and if you want to test that you’ll have to edit my script to change both occurrences of the constant 200 to (at least) 256).

The data block address (DBA) of the root/leaf block is given in both hexadecimal and decimal; but if you can’t work out which file and block number this represents (I happen to know it’s file 16, and I can convert 0x8b to 139 decimal in my head) then you can check for the segment header block and add one to the block number; or use calls to the dbms_utility package to translate the DBA before doing a block dump:

SQL> select
  2          dbms_utility.data_block_address_file(67109003) file#,
  3          dbms_utility.data_block_address_block(67109003) block#
  4  from
  5          dual
  6  /

     FILE#     BLOCK#
---------- ----------
        16        139

1 row selected.

SQL> alter system flush buffer_cache;

System altered.

SQL> alter system dump datafile 16 block 139;

System altered.

Here’s the section of the resulting trace that shows you how bad things had got before the

 seg/obj: 0x172dd  csc:  0x0000000001311be1  itc: 169  flg: E  typ: 2 - INDEX
     brn: 0  bdba: 0x4000088 ver: 0x01 opc: 0
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000001311be1
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x04   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
...
0xa8   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0xa9   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
Leaf block dump

A final comment: if you’re wondering why the table (without an index) can get to 169 rows while the index achieves 168 rows and fails on the 169th – there’s one ITL in the index ITL area that is reserved for the “service ITL” (see answer 2), the one that Oracle uses to register and hold the block on an “index node split”.

Footnote

One of the entertaining things, about poking around with Oracle is the way that you discover unexpected details – some of which are side-effects that you realise you should have predicted, some of which are just a little bizarre. In this case the easter egg in the trace file was the following statement (reproduced from the tkprof output, and made highly visible because I used the sort=execnt option.)

update /*+ rule */ undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,
  undosqn=:7,xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13
where
 us#=:1


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse      336      0.00       0.00          0          0          0           0
Execute    336      0.02       0.02          3        336        531         336
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      672      0.03       0.03          3        336        531         336

The predictable side effect was that Oracle was going to create a number of new undo segments: as a standard detail of minimising contention Oracle tries to give every concurrent transaction (which means every one of my autonomous transactions) its own undo segment.

The surprising detail was the /*+ rule */ hint – still in 19.11.0.0. I guess that that was to ensure that Oracle executed the update through an index access – but possibly a suitable /*+ index() */ hint would be more appropriate to almost every version of Oracle.

There were a number of other segment/undo related statements that operated 140+ times in the course of this test – which did make me wonder if a very busy OLTP system (doing lots of tiny, concurrent, transacation) with a lot of undo segments could spend a significant amount of its time managing undo segments – but that’s a thought for another day. Another “look at that some time” thing that appeared was the large number of selects and inserts to a table call undohist$.

January 25, 2022

255 column catalogue

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 12:17 pm GMT Jan 25,2022

This is a short list of articles discussing the problems (almost always) and oddities that appear when you define a table with more than 255 “real” columns. Odd details are always subject to change over time/version – so if it matters, test it and check it.

Articles from other authors:

January 19, 2022

Hash Aggregation – 2

Filed under: Execution plans,Infrastructure,Oracle,Performance,Tuning — Jonathan Lewis @ 12:03 pm GMT Jan 19,2022

In the note I wrote a couple of days ago about the way the optimizer switches from hash group by to sort group by if you add an order by X,Y clause to a query that has a group by X,Y I mentioned that this had been something I’d noticed about 15 years ago (shortly after Oracle introduced hash aggregation, in fact) but it was only the second of two details I’d noted when experimenting with this new operation. The first thing I’d noticed came from an even simpler example, and here’s a (cryptic) clue to what it was:


column operation_type format a24

select
        operation_type, count(*)
from
        V$sql_workarea
group by
        operation_type
order by
        operation_type
;

OPERATION_TYPE             COUNT(*)
------------------------ ----------
APPROXIMATE NDV                   1
BUFFER                          130
CONNECT-BY (SORT)                10
GROUP BY (HASH)                  12
GROUP BY (SORT)                 240
HASH-JOIN                       466
IDX MAINTENANCE (SORT)           39
LOAD WRITE BUFFERS               10
RANGE TQ COMPUTATION             13
RESULT CACHE                      4
SORT (v1)                        10
SORT (v2)                       147
WINDOW (SORT)                    35

The clue isn’t in the query, it’s in what’s missing from the result set, so here’s some code to create and query some data to make it easier to spot the anomaly:

rem
rem     Script:         hash_agg.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sept 2007
rem

create table t1
as
with generator as (
        select  rownum id
        from    dual
        connect by
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        lpad(mod(rownum-1,1000),6)      small_vc_K,
        lpad(rownum-1,6)                small_vc_M
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid wordpress format issue
;

set serveroutput off

prompt  ===========================
prompt  1000 distinct values (hash)
prompt  ===========================

select
        /*+ gather_plan_statistics 1000 */
        count(*)
from
        (
        select  /*+ no_merge */
                distinct small_vc_K
        from
                t1
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

prompt  ===========================
prompt  1000 distinct values (sort)
prompt  ===========================

select
        /*+ gather_plan_statistics 1000 */
        count(*)
from
        (
        select  /*+ no_merge no_use_hash_aggregation */
                distinct small_vc_K
        from
                t1
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

I’ve added the /*+ gather_plan_statistics */ hint to the query so that I can check on the rowsource execution stats and (particularly) the memory and/or temporary space used; and I’ve blocked hash aggregation in one of the two queries, so I expect to see a “hash unique” operation in the first query and a “sort unique” operation in the second. Here’s what I get from 19.11.0.0:


===========================
1000 distinct values (hash)
===========================

  COUNT(*)
----------
      1000

SQL_ID  1baadqgv02r6b, child number 0
-------------------------------------
select  /*+ gather_plan_statistics 1000 */  count(*) from  (  select
/*+ no_merge */   distinct small_vc_K  from   t1  )

Plan hash value: 171694178

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |   765 (100)|      1 |00:00:00.07 |    2637 |   2632 |       |       |          |
|   1 |  SORT AGGREGATE      |      |      1 |      1 |            |      1 |00:00:00.07 |    2637 |   2632 |       |       |          |
|   2 |   VIEW               |      |      1 |   1000 |   765  (56)|   1000 |00:00:00.07 |    2637 |   2632 |       |       |          |
|   3 |    HASH UNIQUE       |      |      1 |   1000 |   765  (56)|   1000 |00:00:00.07 |    2637 |   2632 |  1889K|  1889K| 1417K (0)|
|   4 |     TABLE ACCESS FULL| T1   |      1 |   1000K|   405  (17)|   1000K|00:00:00.04 |    2637 |   2632 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

17 rows selected.

===========================
1000 distinct values (sort)
===========================

  COUNT(*)
----------
      1000

SQL_ID  a66rqhgw7a6dk, child number 0
-------------------------------------
select  /*+ gather_plan_statistics 1000 */  count(*) from  (  select
/*+ no_merge no_use_hash_aggregation */   distinct small_vc_K  from
t1  )

Plan hash value: 1750119335

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |   765 (100)|      1 |00:00:00.22 |    2637 |       |       |          |
|   1 |  SORT AGGREGATE      |      |      1 |      1 |            |      1 |00:00:00.22 |    2637 |       |       |          |
|   2 |   VIEW               |      |      1 |   1000 |   765  (56)|   1000 |00:00:00.22 |    2637 |       |       |          |
|   3 |    SORT UNIQUE       |      |      1 |   1000 |   765  (56)|   1000 |00:00:00.22 |    2637 | 48128 | 48128 |43008  (0)|
|   4 |     TABLE ACCESS FULL| T1   |      1 |   1000K|   405  (17)|   1000K|00:00:00.02 |    2637 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Look back at the summary of v$sql_workarea. Can you now spot something that might be missing? Since we’ve now got two execution plans with their sql_ids, let’s run a much more precise query against the view.

select
        sql_id, child_number, operation_id, operation_type ,
        total_executions, last_memory_used/1024 last_mem_kb
from
        v$sql_workarea
where
        sql_id in (
                '1baadqgv02r6b',
                'a66rqhgw7a6dk'
)
order by
        sql_id, child_number, operation_id
/

SQL_ID        CHILD_NUMBER OPERATION_ID OPERATION_TYPE       TOTAL_EXECUTIONS LAST_MEM_KB
------------- ------------ ------------ -------------------- ---------------- -----------
1baadqgv02r6b            0            3 GROUP BY (HASH)                     1       1,417
a66rqhgw7a6dk            0            3 GROUP BY (SORT)                     1          42

The first thing to note, of course, is that the sort aggregate at operation 1 doesn’t report a workarea at all; but this shouldn’t be a surprise, the operation is simply counting rows as they arrive, there’s no real sorting going on.

Operation 3 in both cases is the more interesting one. In the plan it’s reported as a “hash/sort unique” but the workarea operation has changed this to a “group by (hash/sort)”.  It’s a little detail that probably won’t matter to most people most of the time – but it’s the sort of thing that can waste a couple of hours of time when you’re trying to analyze a performance oddity.

And another thing …

You’ll have noticed, by the way, that the hash unique operation actually demanded far more memory than the sort unique operation; 1.4MB compared to 42KB. In a large production system this may be fairly irrelevant, especially since the discrepancy tends to disappear for higher volumes, and hardly matters if the operation spills to disc. However, if you have a large number of processes doing a lot of small queries using hash aggregation you may find that cumulatively they use up an unexpected fraction of your pga_aggregate_target.

There is some scope (with the usual “confirm with Oracle support” caveat) for modifying this behaviour with a fix_control:

16792882    QKSFM_COMPILATION_16792882  Cardinality threshold for hash aggregation(raised to power of 10)

alter session set "_fix_control"='16792882:3';

The default value is 0, the maximum value seems to be 3 and the description about raising to power of 10 seems to be the wrong way round, but I haven’t done any careful testing. When I set the value to 3 the hash unique with an estimated output of 1,000 switched to a sort unique.(hence my suggestion about 10N rather than N10).

Footnote

In the previous note I pointed out that Oracle would use a single sort group by operation rather than a consecutive hash group by / sort order by pair of operations. This simpler example helps to explain why. If you check the CBO trace files for the two you can check the costs of the inline aggregate view.

From a 19.11.0.0 trace file here are two costing fragments for the hash group by strategy; the first bit is the cost of the tablescan that acquires the rows (operation 4) the second is the final cost of the inline view / query block:


  Best:: AccessPath: TableScan
         Cost: 404.639881  Degree: 1  Resp: 404.639881  Card: 1000000.000000  Bytes: 0.000000

Final cost for query block SEL$2 (#2) - All Rows Plan:
  Best join order: 1
  Cost: 764.843155  Degree: 1  Card: 1000000.000000  Bytes: 7000000.000000
  Resc: 764.843155  Resc_io: 337.000000  Resc_cpu: 1069607888
  Resp: 764.843155  Resp_io: 337.000000  Resc_cpu: 1069607888

I won’t show you the equivalent extracts for the example where I blocked hash aggregation because the numbers are identical. So there’s no way that Oracle will want to do hash group by followed by sort order by, when it can do just a single sort group by that costs exactly the same as the hash group by operation alone. This is a shame, and a little ironic because when Oracle Corp introduce hash aggregation they made a bit of a fuss about how much more efficient it was than sorting- but it looks like no-one told the optimizer about this.

January 10, 2022

Control File Waits

Filed under: Infrastructure,LOBs,Oracle,Performance — Jonathan Lewis @ 2:10 pm GMT Jan 10,2022

Originally drafted 4th Aug 2011, this is a blog note that has been waiting a little over 10 years to be completed and published; and it’s probably worth doing it even now since the topic came up on the Oracle database forum just a couple of weeks ago.

The original title was CF Enqueues but I’ve combined it with notes from another draft called LOB write waits to get to the final title because a couple of small variations in my test script can produce all the annoying symptoms you may want to investigate in that area.

The forum note that triggered completion and publication was a complaint that’s common: “why is this so slow” but with a cause that comes up  less frequently: “BLOB operations and control file parallel writes”. The OP was running Oracle 19.9 but was suffering from a problem which (when I checked my archive and MOS) turned out to have a history (and a workaround) that started with 8.0.4. Their performance issue was due to their BLOB defintion which, with the storage information removed, was as follows:

LOB (CONTENT) STORE AS BASICFILE (
        ENABLE     STORAGE IN ROW
        CHUNK      32768
        PCTVERSION 10
        NOCACHE
        NOLOGGING
)

Note, particularly, the key words basicfile, nocache, and nologging. You should not be using basicfile LOBs in 19c, and both the nocache and nologging options can introduce problems of I/O and enqueue contention.

The impact of I/O and and enqueue contention depends, of course, on the nature of the work being done – you might be wasting effort or losing time but it might be a waste of resources that you’re not worred about and time that is effectively invisible to the end-user. For this user the task looked like an example of a “batch running as a pl/sql loop” repeatedly executing a statement similar to the following:


insert into table2(blob_column2)
select  blob_column1 
from    table1
where   id = :B1

A large fraction of the processing time was spent on “control file parallel write” (a consequence of nologging basicfile LOBs) and a further significant fraction was spent on “direct path write” (a consequence of nocache basicfile LOBs).

To understand the problem of the control file parallel write waits you need to know that every time a session makes a nologging change to a data file it also has to update an “unrecoverable SCN” field in the controlfile entry for that file. But if you’re inserting lots of individual LOBs then that turns into a lot of control file writes and the time spent on those writes, and the time spent waiting for other sessions to complete their writes and release the CF enqueue that might be protecting the control file from conflicting writes, can become significant.

As I said, this problem was first noted in 8.0.4, and an event was created to disable this control file update for LOBs (which were only of the basicfile variant at the time). The event was 10359 and the entry in $ORACLE_HOME/rdbms/mesg/oraus.msg for this event (which still has the same effect in 19.9) says:


10359, 00000, "turn off updates to control file for direct writes"
// *Cause:
// *Action:  Control files won't get updated for direct writes for  LOBs
//           when NOCACHE NOLOGGING is set. The only bad impact that it
//           can have is that if you are using the recovery manager,
//           it may affect a warning that says that the user should
//           back everything up. Now the recovery manager won't know
//           to tell you that the files that were updated with
//           unrecoverable events should be backed up.

Setting this event solved the OP’s performance problem.

It doesn’t really sound very nice, of course, setting an event that stops Oracle from telling you that “the files … should be backed up”, but the text does say the only bad impact is that it “may affect a warning” not that future attempts at recovery are going to fail. In fact there’s also MOS Doc ID: 1058851.6 Performance Degradation as a result of ‘enq: CF – contention (last updated March 2019 despite only having references to Oracle 8.0.4) which says (my emphasis):

If event 10359 is set to level 1, update of the control file with invalidation
redo is stopped.  This will not affect any recovery, but note that recovery
manager reports will be stale.

With event 10359 out of the way (but see footnote), I want to get back to my comment about switching to securefile LOBs. Here’s a very simple script I wrote a few years ago (and tweaked occasionally thereafter) to test the performance characteristics of LOBs:

rem
rem     Script: lob_write_waits.sql
rem     Author: Jonathan Lewis
rem     Dated:  Aug 2014
rem

drop table test_lobs;

@@setup

create table test_lobs (
        id              number(5),
        bytes           number(38),
        text_content    clob
)
lob (text_content)
store as 
--      securefile
        basicfile
        text_lob(
        disable storage in row
        nocache nologging
--
--      cache logging compress low                              --  0.26
--      cache reads logging compress low                        --  1.00
--      nocache logging compress low                            --  1.00
--
--      cache nologging compress low                            -- 14.00
--      cache reads nologging compress low                      --  1.00
--      nocache nologging compress low                          --  1.00
--
--      cache filesystem_like_logging compress low              -- 11.00
--      cache reads filesystem_like_logging compress low        --  1.00
--      nocache filesystem_like_logging compress low            --  1.00
)
;

insert into test_lobs values (1,1000, rpad('x',3000));
commit;

execute snap_my_stats.start_snap
execute snap_events.start_snap
execute snap_enqueues.start_snap

set serveroutput off
set feedback off

-- alter session set events '10359 trace name context forever, level 1';
-- alter session set db_unrecoverable_scn_tracking=false;
-- alter session set events '10046 trace name context forever, level 8';

declare
        c1 clob;
begin
        for r in 1..1000 loop
                insert into test_lobs values(r, 1000, lpad(r,1000,0));
                commit;
        end loop;
end;
/

commit;

alter session set events '10046 trace name context off';
alter session set events '10359 trace name context off';
alter session set db_unrecoverable_scn_tracking=true;

spool lob_write_waits

set serveroutput on
set feedback on

execute snap_enqueues.end_snap
execute snap_events.end_snap
execute snap_my_stats.end_snap

spool off

The script creates a table with a LOB column and inserts one row into it to avoid minor details of object creation and/or block formatting affecting the results I see from a loop test inserting 1000 rows one at a time with commit. You’ll notice that I’ve got lots of possible configurations for the LOB segment, including a choice between basicfile and securefile.

As it stands it will create a basicfile lob with the nocache nologging attributes (all the other options I have commented out are possible only for securefile LOBs thanks to the compress low clause.) Apart from the loop the script also allows a couple of parameters to be changed, or events to be set; and captures snapshots of sundry statistics.

If I allow the controlfile activity to take place, the following snippet shows the highlights of what my basicfile LOB does for me (or to me!) in 21c when the database is in ARCHIVELOG mode (which you shold be if you are trying to test the effects of any unrecoverable or nologging operation – the database is not in force_logging mode, by the way):

----------------------------------
System enqueues
Interval:-      6 seconds
----------------------------------
Type    Requests       Waits     Success      Failed    Wait m/s Reason
----    --------       -----     -------      ------    -------- ------
CF         2,006           0       2,006           0           0 contention
XR         1,000           0       1,000           0           0 database force logging
TM         1,002           0       1,002           0           0 contention
TX         1,031           0       1,031           0           0 contention

---------------------------------------------------------
Session Events
Interval:-      6 seconds
---------------------------------------------------------
Event                                             Waits   Time_outs            Csec    Avg Csec    Max Csec
-----                                             -----   ---------            ----    --------    --------
direct path sync                                  1,000           0           125.32        .125           6
control file sequential read                      5,014           0             5.50        .001           0
control file parallel write                       4,000           0           385.75        .096           6
direct path write                                 1,000           0             3.02        .003           5

---------------------------------
Session stats
Interval:-  6 seconds
---------------------------------
Name                                                                         Value
----                                                                         -----
total cf enq hold time                                                       3,953
total number of cf enq holders                                               1,000
physical writes                                                              1,000
physical writes direct                                                       1,000
physical writes direct (lob)                                                 1,000
redo entries                                                                 2,638
redo size                                                                1,647,276
redo size for direct writes                                                102,460
KTFB alloc space (block)                                                 9,437,184
ASSM gsp:get free block                                                      1,014
lob writes                                                                   1,000

As you can see, there’s a significant penalty due to the CF (controlfile) enqueues and control file reads and writes; and if I simply set event 10359 to level 1 (or set db_unrecoverable_scn_tracking to false) the most significant change is that the CF and control file I/O disappear and the job takes about 2 seconds instead of 6.

There’s nothing particularly interesting about the session activity stats, but since the activity is nologging it’s interesting to note that the redo size is nowhere near the 8MB we might expect (8KB blocks x 1,000 LOBs) even though we can see that Oracle has had to allocate 9MB of space for (mostly) the LOB data segment. Each nologging/unrecoverable LOB data block has averaged about 100 bytes of redo for tracking unrecoverable blocks. (When I changed the LOB chunk to 32K this KTFB allocation increased to 32MB – but that shouldn’t be a surprise).

When I switch to securefile LOBs (with the nocache nologging options) the results look like this:

----------------------------------
System enqueues
Interval:-      2 seconds
----------------------------------
Type    Requests       Waits     Success      Failed    Wait m/s Reason
----    --------       -----     -------      ------    -------- ------
TM         1,001           0       1,001           0           0 contention
TX         2,073           0       2,073           0           0 contention
DW         1,057           0       1,057           0           0 contention

---------------------------------------------------------
Session Events
Interval:-      2 seconds
---------------------------------------------------------
Event                                             Waits   Time_outs            Csec    Avg Csec    Max Csec
-----                                             -----   ---------            ----    --------    --------
direct path write                                 1,000           0            85.72        .086           4

---------------------------------
Session stats
Interval:-  2 seconds
---------------------------------
Name                                                                         Value
----                                                                         -----
redo entries                                                                 7,737
redo size                                                                3,594,280
KTFB alloc space (block)                                                 1,114,112
securefile allocation bytes                                              8,192,000
securefile allocation chunks                                                 1,000
securefile direct write bytes                                            8,192,000
securefile direct write ops                                                  1,000
securefile bytes non-transformed                                         2,000,000

If I use the compression low option at the same time the statistic about securefile bytes non-transformed drops to zero – not surprisingly – and I see 1,000 WG enqueues with reason “delete fso” and 3,000 WG enqueues with reason “lock fso”, but the timing is basically unchanged. The DW enqueue is the “In memory Dispenser” where the id1/id2 reference a tablespace and block address, and the WG enqueue is the “Write gather local enqueue” where the id1/id2 report the LOB ID (in two halves).

The most significant change, though, and the one that made me spend time on looking at results and traces etc. was what happened when I set the system to use “cache nologging” (or “cache filesystem_like_logging”) – the effect was really surprising; the report on enqueues didn’t show any significant differences (the same WG enqueues appeared if I enabled compression), but this is what happened with the session events:

---------------------------------------------------------
Session Events
Interval:-      12 seconds
---------------------------------------------------------
Event                                             Waits   Time_outs            Csec    Avg Csec    Max Csec
-----                                             -----   ---------            ----    --------    --------
write complete waits                              1,000           0         1,007.02       1.007           5

For every LOB write there was a write complete wait that averaged close to one centisecond! That’s a fairly suspect number, looking more like a time-out from a self-timer than a wait that’s terminated by some sort of signal of an action completed. After thinking about what might be happening and browsing through the session activity stats I noticed the following clue:

---------------------------------
Session stats
Interval:-  12 seconds
---------------------------------
Name                                                                         Value
----                                                                         -----
messages sent                                                                2,000

That looks like two messages sent for each LOB inserted – which I confirmed by doing a few more tests with different numbers of LOBs. So who is received those messges (and not replying to them)? I re-ran the test taking a snapshot of the session stats for every session in the instance and extracted (using grep) the figures for just the “messages sent” and “messages received”. Here are the results, rearranged slightly:

SID:   283: - oracle - oracle@linux21.localdomain (W003)
messages sent                                                              4
SID:   293: - oracle - oracle@linux21.localdomain (W006)
messages sent                                                              3
SID:   294: - oracle - oracle@linux21.localdomain (W005)
messages sent                                                              4
SID:    20: - oracle - oracle@linux21.localdomain (W000)
messages sent                                                              8
SID:    33: - oracle - oracle@linux21.localdomain (W002)
messages sent                                                              2
SID:    44: - oracle - oracle@linux21.localdomain (W007)
messages sent                                                              3

SID:    51:TEST_USER - jonathan - sqlplus@linux21.localdomain (TNS V1-V3)
messages sent                                                          2,004    (2,000)
SID:   263: - oracle - oracle@linux21.localdomain (DBW0)
messages sent                                                          1,989    (2,000)
messages received                                                      3,021    (3,000)
SID:    11: - oracle - oracle@linux21.localdomain (LGWR)
messages sent                                                          1,021    (1,000)
messages received                                                      2,019    (2,000)

If we ignore the messages sent by Wnnn (which, I believe, go to DBW0) and allow for a few timing/rounding errors, we might decide to work with the figures in brackets in the above. With a few extra tests, changing “commit;” to “commit write wait;” or commenting out the “commit;” my best interpretation (i.e. guess) of these numbers came down to:

  • The user session sends 1,000 messages to DBWR (to write the nologging LOB) and 1,000 messages to LGWR to write the “ordinary” table redo (and the number 1,000 also show up in the “redo synch writes” stat and  “log file sync” waits if we use “commit write wait”)
  • LGWR receives 1,000 messages from the session, and 1,000 messages from DBW0 (which would relate to “LOB management” with redo OP code 26.4 and 13.53) and sends 1,000 completion messages to DBW0.
  • DBW0 receives 1,000 messages from the session and 1,000 messages from LGWR, and sends 1,000 “log LOB Maintenance” messages to LGWR.

And if you’ve been counting carefully you’ll have noticed that there are still 1,000 message sent and received by DBW0 that I haven’t been able to account for. It’s possibly a “double-counting” error; or maybe there’s another process involved in a dialogue with DBW0 that doesn’t record its end of the dialogue, or maybe it’s DBW0 sending messages to itself(!). At some point I may remember how to switch on message tracing and check exactly which messages are sent from where to where, but for the moment I’m going to put a line under the investigation and just highlight the critical effect.

Whatever’s going on, a key detail from the experiments above is that the session does not seem to get any sort of message telling it that the unlogged LOB is safely on disc, it just goes to sleep for 1/100 second – presumably checking the buffer status when it wakes up to decide whether or not the cached LOB has been written.

Footnote

You may have come across the parameter db_unrecoverable_scn_tracking which appeared some time around 11.2.0.4 with the description: “Track nologging SCN in controlfile”. If this parameter is set to true (the default value) at instance startup then you can use event 10359 to switch the feature on and off at the session (and possibly system) level. More elegantly, though, you can set the parameter at the session level instead of using the numeric event to address the problem:

alter session set db_unrecoverable_scn_tracking=false;

I would advise against setting the value to false at instance startup, or even through a call to “alter system”; there’s a comment in Doc ID 12360160.8 – ALTER SYSTEM / SESSION of DB_UNRECOVERABLE_SCN_TRACKING that if the parameter is set to false at startup it can’t be changed dynamically thereafter.  (I haven’t tested if this last detail is still true.)

tl;dr

You really shouldn’t be using basicfile LOBs with newer versions of Oracle, you need to convert them to securefile LOBs.

If you do use basicfile LOBs then you may need to set event 10359 (or adjust the db_unrecoverable_scn_tracking parameter) if you want them to be nologging LOBs.

Even with securefile LOBs the options “cache nologging” and “cache filesystem_like_logging” introduce a surprising I/O wait time of 1/100 second per LOB, so those are two options to avoid.  All other securefile variations seemed to behave nicely (for me) but the amount of undo and redo can vary considerably with option so even with a 1/100 second time penalty you might still choose to use one of these options.

 

Next Page »

Website Powered by WordPress.com.