Several years ago I published a couple of examples of scripts that I’d been using since Oracle 6 (with minor changes over time) to analyse the content of tables in some detail. The first was a breakdown of the lengths of the rows in the table, the second was a map showing the distribution of nulls in the rows. There used to be a third component of the analysis which produced a report of the number of non-null entries for each column in the table, but by the time I published the first two parts there didn’t seem to be much point in publishing the third since you could get a sufficiently accurate picture by querying view user_tab_columns after gathering stats:
break on num_rows select tab.num_rows, col.column_name , col.num_nulls from user_tables tab join user_tab_columns col on col.table_name = tab.table_name where tab.table_name = 'T1' order by col.num_nulls desc, col.column_id / NUM_ROWS COLUMN_NAME NUM_NULLS ---------- -------------------- ---------- 56777 EDITION_NAME 56777 CREATED_APPID 56777 CREATED_VSNID 56777 MODIFIED_APPID 56777 MODIFIED_VSNID 56777 SUBOBJECT_NAME 56570 DATA_OBJECT_ID 55353 DEFAULT_COLLATION 51058 EDITIONABLE 40216 OWNER 0 OBJECT_NAME 0 OBJECT_ID 0 OBJECT_TYPE 0 CREATED 0 LAST_DDL_TIME 0 TIMESTAMP 0 STATUS 0 TEMPORARY 0 GENERATED 0 SECONDARY 0 NAMESPACE 0 SHARING 0 ORACLE_MAINTAINED 0 APPLICATION 0 DUPLICATED 0 SHARDED 0
In this example t1 is a copy of view all_objects and I’ve listed the columns in order of most frequently null with the table’s num_rows reported at the top for comparison purposes. If you were going to do something that made it a good idea to rebuild the table you could use this report as a guide to re-arranging the column order to position the most frequently null columns towards the end of the row (saving the “trailing nulls take no space” length bytes).
This approach depends on the stats being up to date and accurate, of course, so if you didn’t have them, and didn’t want to collect them, another strategy would be to run a query like the following:
select to_char(count(OWNER),'999,999,999,999,999') OWNER, to_char(count(OBJECT_NAME),'999,999,999,999,999') OBJECT_NAME, to_char(count(SUBOBJECT_NAME),'999,999,999,999,999') SUBOBJECT_NAME, to_char(count(OBJECT_ID),'999,999,999,999,999') OBJECT_ID, to_char(count(DATA_OBJECT_ID),'999,999,999,999,999') DATA_OBJECT_ID, to_char(count(OBJECT_TYPE),'999,999,999,999,999') OBJECT_TYPE, to_char(count(CREATED),'999,999,999,999,999') CREATED, to_char(count(LAST_DDL_TIME),'999,999,999,999,999') LAST_DDL_TIME, to_char(count(TIMESTAMP),'999,999,999,999,999') TIMESTAMP, to_char(count(STATUS),'999,999,999,999,999') STATUS, to_char(count(TEMPORARY),'999,999,999,999,999') TEMPORARY, to_char(count(GENERATED),'999,999,999,999,999') GENERATED, to_char(count(SECONDARY),'999,999,999,999,999') SECONDARY, to_char(count(NAMESPACE),'999,999,999,999,999') NAMESPACE, to_char(count(EDITION_NAME),'999,999,999,999,999') EDITION_NAME, to_char(count(SHARING),'999,999,999,999,999') SHARING, to_char(count(EDITIONABLE),'999,999,999,999,999') EDITIONABLE, to_char(count(ORACLE_MAINTAINED),'999,999,999,999,999') ORACLE_MAINTAINED, to_char(count(APPLICATION),'999,999,999,999,999') APPLICATION, to_char(count(DEFAULT_COLLATION),'999,999,999,999,999') DEFAULT_COLLATION, to_char(count(DUPLICATED),'999,999,999,999,999') DUPLICATED, to_char(count(SHARDED),'999,999,999,999,999') SHARDED, to_char(count(CREATED_APPID),'999,999,999,999,999') CREATED_APPID, to_char(count(CREATED_VSNID),'999,999,999,999,999') CREATED_VSNID, to_char(count(MODIFIED_APPID),'999,999,999,999,999') MODIFIED_APPID, to_char(count(MODIFIED_VSNID),'999,999,999,999,999') MODIFIED_VSNID, to_char(count(*),'999,999,999,999,999') row_count fromi t1 ;
You don’t really need the to_char() function, but it’s a conveience for what I’m going to do with the SQL. Obviouslty it would be a bit tedious to create this statement by hand so, as I did in the previous “rowsize” notes, I’ve written some code to generate it for me:
rem rem row_size_2b.sql rem rem Generate SQL to report counts of rem non-null columns in a table. rem set linesize 32000 set feedback off define m_table = '&1' declare m_string varchar2(32000) := 'select '; m_cursor sys_refcursor; begin for r in ( select column_name, data_type from user_tab_columns where table_name = upper('&m_table') ) loop m_string := m_string || ' to_char(count(' || trim(r.column_name) || '),''999,999,999,999,999'') ' || trim(r.column_name) || ',' || chr(10) ; end loop; m_string := m_string || ' to_char(count(*),''999,999,999,999,999'') row_count from &m_table'; -- dbms_output.put_line(m_string); print_table(m_string); end; / set linesize 156
The script accepts a table name from the user’s schema (you could edit this to query dba_tab_columns), constructs a string (as shown above – though I’ve done a little cosmetic work on it), and then passes the resulting string to a verion of Tom Kyte’s print_table() routine which produces the following output:
OWNER : 56,777 OBJECT_NAME : 56,777 SUBOBJECT_NAME : 207 OBJECT_ID : 56,777 DATA_OBJECT_ID : 1,424 OBJECT_TYPE : 56,777 CREATED : 56,777 LAST_DDL_TIME : 56,777 TIMESTAMP : 56,777 STATUS : 56,777 TEMPORARY : 56,777 GENERATED : 56,777 SECONDARY : 56,777 NAMESPACE : 56,777 EDITION_NAME : 0 SHARING : 56,777 EDITIONABLE : 16,561 ORACLE_MAINTAINED : 56,777 APPLICATION : 56,777 DEFAULT_COLLATION : 5,719 DUPLICATED : 56,777 SHARDED : 56,777 CREATED_APPID : 0 CREATED_VSNID : 0 MODIFIED_APPID : 0 MODIFIED_VSNID : 0 ROW_COUNT : 56,777 ----------------- 1 rows selected
As with many of the scripts I’ve pubished, the task is one I rarely do, so I haven’t bothered to add in all the extra little refinements that might make it foolproof and totally self-contained. In this case, for example I’d have to go through a final couple of manual steps to edit the output by deleteing the last few lines then (since I use vi) issuing “!%!sort -k3 -n -r” which would sort the results in reverse order, numerically, according to the third field.
Comments and related questions are welcome.