The following request appeared on the Oracle Forum a few days ago:
I have a select query
select col1,col2,col3 from table1 order by col1,col3
This table contains 4.5 million records initially and the select was returning records in less than 2 minutes .
This query has been running for the last year with out any issues, but from last week onwards this query is taking more than 15 min to complete . No change in database and other components.
How can I find out the root cause of this issue ? Any specific area I need to check ?
There are two important points in this request – first the statement that “nothing changed”, and second the need to discover “what was happening in the past”.
Obviously something did change. It may have been the statistics, it may have been both the data and the statistics. (And it may have been the index definitions that changed, or an index may have been rebuilt, or some parameter may have been adjusted).
Since the query is now persistently slow, the most innocent explanation is that the data has changed; the second most innocent explanation is that the statistics have changed. The former could change the execution time, the latter could change the execution plan – which could be a poor plan and change the execution time.
The query (as supplied) is not selective. It’s got to acquire 4.5 million rows and supply them in order. There are only a few execution paths for the query, and these depend on the available indexes and constraints:
- Full tablescan with sort
- Index fast full scan with sort (if an index holding all three columns exists)
- Index full scan with no visit to table and no sort if the index in (2) starts (col1, col3)
- Index full scan with table access by rowid (and no sort) if an index starts (col1, col3) but doesn’t hold col2
Increase the amount of data and the same plan could take a lot longer: a sort might spill to disk instead of completing in memory; a full scan with or without table access might lose a caching benefit if the index (option 3) or table (option 4) cannot be kept in memory while the scan progresses.
So how you do find out what was happening when things went well ? Possibly the documentation includes a statement of the expected execution plan. Possibly you’ve been running statspack or AWR and this was a query that was captured with its execution plan. Possibly the only thing you can do is guess.
Work out all the possible execution plans for the query, then use hints to check the performance for each execution plan. If one of the plans drops back to 2 minutes, then maybe that’s the plan you used to have and you work out the root cause by comparing the old plan, the new plan, and the costs.
If none of the other plans gets you close to the original execution time, then something has changed on your system. (Although, possibly, you aren’t doing the tests fairly).