It’s amazing how easy it is to interpret a number incorrectly until the point comes where you have to look at it closely – and then you realise that there was a lot more to the number than your initial casual assumption, and you would have realised it all along if you’d ever needed to think about it before.
Here’s a little case in point. I have a simple (i.e. non-partitioned) heap table t1 which is basically a clone of the view dba_segments, and I’ve just connected to Oracle through an SQL*Plus session then run a couple of SQL statements. The following is a continuous log of my activity:
SQL> select table_name, partitioned, blocks from user_tables; TABLE_NAME PAR BLOCKS -------------------- --- ---------- T1 NO 958 1 row selected. SQL> select name, value from v$mystat ms, v$statname sn where ms.statistic# = sn.statistic# and name like '%table scan%'; NAME VALUE -------------------------------- ---------- table scans (short tables) 9 table scans (long tables) 0 table scans (rowid ranges) 0 table scans (cache partitions) 0 table scans (direct read) 0 table scan rows gotten 4188 table scan blocks gotten 14 7 rows selected. SQL> select count(extents) from t1; COUNT(EXTENTS) -------------- 5143 1 row selected. SQL> select name, value from v$mystat ms, v$statname sn where ms.statistic# = sn.statistic# and name like '%table scan%';
- My cache size is quite small, so t1 doesn’t count as a “short” table.
- I’ve collected stats on the table (and there are no indexes) so the optimizer doesn’t need to do any dynamic sampling to generate an execution plan.
- This is 11g, so there are no SQL Plan Directives in place to force dynamic sampling
So here’s the question: how many “table scans (long tables)” will Oracle record against my session when I re-run that query against v$mystat ?
Warning – this IS a trick question.
Update number 1
I said it was a trick question and, as you will have seen if you’ve had time to read the comments, the answer is going to depend on various configuration options. Here’s what I got in my test – and I’ve reported not just the session stats, but the segment statistics (v$segment_statistics):
NAME VALUE -------------------------------- ---------- table scans (short tables) 9 table scans (long tables) 26 table scans (rowid ranges) 26 table scans (cache partitions) 0 table scans (direct read) 0 table scan rows gotten 9331 table scan blocks gotten 954 7 rows selected. SQL> select owner, object_name, object_type, value from v$segment_statistics where owner = 'TEST_USER' and statistic_name = 'segment scans' and value != 0; OWNER OBJECT_NAME OBJECT_TYPE VALUE --------------- -------------------- ------------------ ---------- TEST_USER T1 TABLE 26 1 row selected.
I recorded 26 long table scans – and that 26 was echoed in the segment statistics. (Note: I don’t normally use v$segment_statistics, which is a join between a couple of in-memory structures and three real table, I usually query v$segstat). It’s the segment statistics that made me pause in a recent problem review; up to that moment I had been using the “Segments by Table Scans” section of the AWR Report as a useful (but badly named – since it also counts (most) index fast full scans) indicator of a potential threat, then I suddenly realised that there was yet another important detail I had to check before I could determine what the numbers were telling me.
So the next question is WHY has Oracle reported 26 tablescans ?
Update number 2
Martin Preiss (see comments) has been working hard to investigate this, and managed to produce a couple more variations in statistics for “the same” problem. In my case I had declared the table as “parallel 2”. For a table of degree N Oracle is very likely to break the table into 13*N chunks – there’s a historic reason for 13 – and that’s why I see 26 “rowid range” scans which, unfortunately, are also recorded as “long” table scans even though each is individually well short of the short table limit. This is really annoying when you start to look at the “Segments by Table Scan” report buecause you now don’t know how many times a table really was scanned unless you know something about the run-time degree of parellelism involved.
If you check the hidden parameters you will find several parameters relating to this chunking, in particular:
_px_min_granules_per_slave minimum number of rowid range granules to generate per slave (default 13) _px_max_granules_per_slave maximum number of rowid range granules to generate per slave (default 100)
Technically, therefore, Oracle MIGHT get to 100 granules per parallel slave, and a tablescan at degree 2 could be recorded as 200 tablescans!
Martin asked why my 26 tablescans didn’t show up as “table scans (direct read)” – but that’s 11g for you, it allows a serial tablescan to use direct path reads, and it allows parallel tablescans to read into the cache, and the combination of cache size and table size meant that my example just happened to read into the cache.
And that introduces ANOTHER interpretation problem – what annoying things might I discover if I declare the table the the CACHE option ? (as Ivica suggested in his first comment below) ?
Final Update (probably)
Iviva has been busy on checking the CACHE operation and shown that there are a number of cases to consider (more, in fact, than I was planning to mention – and the variation the combination in NOPARALLEL and CACHE should give you some pause for thought). The only point I wanted to make was the effect of enabling PARALLEL and CACHE; I don’t think that this is a combination that is very likely to appear in a production system, but boundary conditions (and accidents) do occur. With my little sample – even after I grew the table to be MUCH larger, the tablescan DIDN’T get reported: here are some session stats and segment stats (taken using my snapshot code) of a single tablescan running parallel 2 when the table was cached:
Name Value ---- ----- table scans (rowid ranges) 26 table scans (cache partitions) 26 table scan rows gotten 164,672 table scan blocks gotten 29,611 Statistic Value --------- ------------ T1 logical reads 30,272 physical reads 29,614 physical read requests 261
I know I’ve done a tablescan in 26 pieces (parallel 2) – and scanned 29,000+ blocks doing it; but according to the segment stats AND session stats I haven’t done a tablescan. Fortunately, of course, I can see the parallel tablescan in the session stats, and in this isolated case I can see from the “cache partitions” statistics that that tablescan was on a cached table. But if I’ve been depending on the segment stats to tell me about which tablescans happen most frequently and do most work I’ve found another reason why I can’t trust the stats and have to do more work cross-checking different parts of the AWR for self-consistent results.
I don’t expect to add any more comments about this mixture of tablescans and parallelism, with the varying effects on the session and segment statistics – but there’s always the possibility that one day I’ll start to worry about how the KEEP and RECYCLE (db_keep_cache_size and db_recycle_cache_size) could confuse things further.
Footnote:
It’s probably worth pointing out that the segment statistics have never recorded SHORT tablescans, they’ve only ever captured details of LONG tablescans. There is a slight inconsistency here, though, since they capture all the “index fast full scans (full)” reported in the session stats whether they are short or long – not that the session stats record the difference; unfortunately, another flaw creeps in: parallel index fast full scans “disappear” from the stats, although they show up as “index fast full scans (rowid ranges)” and “index fast full scans (direct)” with the same multiplier of 13 that we see for parallel table scans.
I didn’t check -but: maybe 0 if the full table scan uses direct path reads (if this scan appears under “table scans (direct read)”)?
Comment by Martin Preiss — January 29, 2016 @ 2:18 pm GMT Jan 29,2016 |
Martin,
That’s an interesting thought, and given the number of possible ways in which Oracle can exercise a full tablescan one that’s worth checking.
Comment by Jonathan Lewis — January 29, 2016 @ 5:28 pm GMT Jan 29,2016 |
Jonathan,
it seems that serial full table scans using direct path reads are counted two times: as table scans (long tables) and as table scans (direct read). At least that’s what I see in 11.2.0.1:
That at least something I never thought about.
Regards
Martin
Comment by Martin Preiss — January 29, 2016 @ 6:34 pm GMT Jan 29,2016 |
Martin,
One of the common problems with reading AWR / Statspack reports is working out whether two numbers should be added, or if one of them is a subset of the other, or whether there are so many possible code paths that there is a partial overlap between the two figures.
Comment by Jonathan Lewis — January 29, 2016 @ 7:32 pm GMT Jan 29,2016
I created a table the same way Martin did. When I tested, I found some interesting results:
Comment by Arian — January 29, 2016 @ 7:45 pm GMT Jan 29,2016
Arian,
Another interesting little detail – whenever you start to look closely you almost always end up discovering something new.
I haven’t noticed this happening before, but my starting hypothesis would be that Oracle decided the table was a short table based on the object statistics (user_tables.blocks) – but that the run-time tablescan showed Oracle that (thanks to the randomised formatting due to ASSM) the HWM was a lot further out than the used block count indicated. It would have been interesting to see the number of tablescan blocks read – did Oracle change the way it counted when it moved from cache reads to direct path reads.
Comment by Jonathan Lewis — January 29, 2016 @ 7:59 pm GMT Jan 29,2016
Hi Jonathan,
the measurement does seem to be a bit funny. I tried to trace both statements. But with trace enabled, it always counts a table scans (long tables), for both statements. Without tracing, it counts a table scans (short tables) from the uncommented query and a table scans (long tables) for the commented query.
I will look into this a bit more. But first trying to find out where the 26 segment scans come from.
Comment by Arian — January 29, 2016 @ 8:17 pm GMT Jan 29,2016
If the table is cached with “alter table … cache” it would be (“forced”) considered as short table and therefore you should see 0 on “table scans (long tables)”.
Ivica
Comment by Ivica Arsov — January 29, 2016 @ 4:14 pm GMT Jan 29,2016 |
Ivica,
Interesting – I hadn’t noticed that before (I think). I wonder how many other related statistics change at the same time – and does it report as as “short table”, or does it simply disappear ? If so, under what circumstances.
Comment by Jonathan Lewis — January 29, 2016 @ 5:31 pm GMT Jan 29,2016 |
Hi Jonathan,
This is what I’ve discovered from my morning test cases :)
It depends on table definition wether it is defined as parallel/cache, parallel/nocache, noparallel/cache, noparallel/nocache.
1. Table with noparallel definition:
1a) CACHE: table scans should appear as table scans (short tables)
*However if table blocks (after gather_table_stats) > (_small_table_thresholds * 5) – table scans will appear as table scans (cache partitions)
1b) NOCACHE: table scans should show as table scans (long tables)
2. Table with parallel definition, as in your case with “parallel 2”
1a) CACHE: table scans should show as table scans (cache partitions)
table scans always appear as (cache partitions) regardless of “table blocks (after gather_table_stats) > (_small_table_thresholds * 5)”
2b) NOCACHE: table scans should show as table scans (long tables)
Comment by Ivica Arsov — January 30, 2016 @ 12:18 pm GMT Jan 30,2016 |
For 958 blocks, the count of 5143 is rather low. That means that there are a lot of null values in the extents column, or there are a lot of (mostly) empty blocks. (Assuming a regular 8K block size)
I am still considering if and how this can influence the “table scans (long tables)” count.
Comment by Arian — January 29, 2016 @ 5:23 pm GMT Jan 29,2016 |
Arian,
Very observant – it was a copy of the dba_segments table so extents was never null; but I’d forgotten that I moved the table after changing pctfree to 90 – the table was left over from another test where I’d been playing around with row migration etc.
Comment by Jonathan Lewis — January 29, 2016 @ 5:33 pm GMT Jan 29,2016 |
after I did realize that I didn’t have an idea of the exact semantics of “table scans (rowid ranges)” I looked at https://docs.oracle.com/cd/B28359_01/server.111/b28320/stats002.htm#i375475 and found there “During parallel query, the number of table scans conducted with specified ROWID ranges”. So maybe the 26 is the number of parallel slaves accessing the table?
Comment by Martin Preiss — January 29, 2016 @ 8:54 pm GMT Jan 29,2016 |
with parallel query I had to add some rows before the table scan was again classified under “table scans (long tables)”:
So the 26 is apparently not the number of parallel processes – but rather the number of rowid ranges scanned by the parallel slaves. Taking a look at the sql monitor the 26 appears for execs.
So it seems that parallel query ruins this statistic…
Comment by Martin Preiss — January 29, 2016 @ 9:19 pm GMT Jan 29,2016 |
but then – why does your result not contain “table scans (direct read)”. Obviously I am missing something…
Comment by Martin Preiss — January 29, 2016 @ 9:22 pm GMT Jan 29,2016 |
rebuilding the example with the most simple setup I finally get your numbers:
Comment by Martin Preiss — January 29, 2016 @ 9:34 pm GMT Jan 29,2016