Sometimes you find bugs on MOS (Metalink, OCIS, whatever) that make you feel positively ill. I’ve just been on a customer site where (in passing) they mentioned that one of their historic queries against v$sqlstats now tool just over one second (CPU) in 220.127.116.11 when it had previously taken about 200ms on 10.2.0.4***. After a little checking it seemed likely that the change was possibly related to the fact that they had increased the size of the SGA significantly, allowing for a much larger shared pool and library cache; however there have been numerous code changes in the shared pool area on the route from 10g to 11g, so I decided to check MOS to see if anyone else had seen a similar problem. I found this:
Bug 13550185 : QUERIES ON V$SQLSTATS MAY CAUSE CRASH OR HANG
Fixed in 12.1
Last updated: 29th Nov 2012
There is a documented workaround in the base bug – stop collecting SQL execution statistics, but this rather defeats the point of having AWR snapshot capturing all that “SQL ordered by … “ stuff. On the other hand, if some queries against v$sqlstats can cause sessions to crash or the database to hang then every AWR snapshot might be the one that hangs your database !
The bug in question references Solaris 10 on SPARC, but a quick search for the bug number finds 66 other entries in the bug database and the first one I looked at was on Redhat Linux 5; and the bug also describes itself as “platform generic”. On a positive note, the status is currently “80 – Development to QA/Fix Delivered Internal”, and when I raised the reported the threat to the client, they contacted their Oracle Support contact and discovered that there was a patch available already (number 15033625), and that they had already installed it as a side effect of installing a merge patch to fix a problem with corruption in the shared pool. If you search the patch lists on MOS you’ll find that there are patches for many different versions in the 18.104.22.168 and 22.214.171.124 range, for several different platforms.
*** An increase from 200 ms to 1 second may not seem something worth worrying about, but the client ran the query every 10 seconds because (as it said on the packet in 10g) v$sqlstats is latch free and very cheap to run and (b) they have SLAs which mean they basically need to know if any of their critical, high-frequency, queries are going bad within one minute of the problem starting to occur.