Oracle Scratchpad

May 11, 2015


Filed under: Oracle,Performance — Jonathan Lewis @ 9:24 pm BST May 11,2015

Here’s a question to which I don’t know the answer, and for which I don’t think anyone is likely to need the answer; but it might be an entertaining little puzzle for thr curious.

Assume you’re doing a full tablescan against a simple heap table of no more than 255 columns (and not using RAC, Exadata, In-memory, etc. etc. etc.), and the query is something like:

select  {columns 200 to 250}
from    t1
where   column_255 = {constant}

To test the predicate Oracle has to count its way along each row column by column to find column 255. Will it:

  1. transfer columns 200 to 250 to local memory as it goes, then check column_255 — which seems to be a waste of CPU if the predicate doesn’t evaluate to TRUE
  2. evaluate the predicate, then walk the row again to transfer columns 200 to 250 to local memory if the predicate evaluates to TRUE — which also seems to be a waste of CPU
  3. one or other of the above depending on which columns are needed, how long they are, and the selectivity of the predicate

How would you test your hypothesis ?


  1. Hi Jonathan,
    Most of the modern processors have super-scalar capability, to execute the same command on the whole array. Also, traditionally, Oracle has been very careful with memory, always trying to preserve it as much as possible. My hypothesis would be that the conditions are evaluated on the SGA block copy and that only the necessary columns are transferred to the PGA.
    How would I test it? I would ask Jonathan Lewis, of course! Just kidding. I would write two queries: one selecting all 255 columns, the infamous ‘select *” and the query you proposed. If the PGA expenditure is does not differ greatly between two cases, you got your answer: all columns are transferred to the PGA.

    Comment by mgogala — May 11, 2015 @ 10:44 pm BST May 11,2015 | Reply

    • The point isn’t about what gets transferred into the PGA, but WHEN it gets transferred. Does Oracle transfer before is “knows” that is needs the data, or does it get scanned twice to avoid a redundant transfer.

      Comment by Jonathan Lewis — May 18, 2015 @ 6:41 pm BST May 18,2015 | Reply

  2. How wonderful it will be if oracle could scan the rows “backwards” (aka starting from last column and work backwards of each row) to address this type of query efficiently, when using full table scan ?

    p.s. It is late at night and I am dreaming…clearly

    Comment by Narendra — May 11, 2015 @ 11:07 pm BST May 11,2015 | Reply

    • To do something like that you’d have to do something like keep an array of column start positions for each column in a row, or keep the column length at the end as well as the start of each column (if you’re going to let Oracle pick the best (forward/back) direction to read the row). That’s potentially a lot more storage or a lot more complication.

      Comment by Jonathan Lewis — May 18, 2015 @ 6:44 pm BST May 18,2015 | Reply

  3. Narendra, it wouldn’t be wonderful at all. Please remember that Oracle supports 1000 columns, not just 255 and that only 255 columns can be stored in the same row piece. What you brand as “wonderful” would result in so much processing that an application would slow down to a crawl.

    It would be very nice if someone would make an intelligent storage, which would automatically eliminate blocks with no rows satisfying the condition using, let’s say, mathematical algorithm called “Bloom filters” and only returned blocks with high probability of containing the columns satisfying the condition, now that would be a great help. You could have, say 14 such “smart storage nodes” utilizing Bloom filters and 8 dual Xeon processor nodes organized as RAC, to provide sufficient CPU power. One could connect the storage nodes with the database nodes using Infiniband, with 40GB/sec connection speed. Now, that would really help performance.

    In such scenario, one would not need to worry about wasting CPU power, there would be CPU power galore. Still, it would probably not be sufficient for the amount of processing needed for what you suggest.

    Comment by mgogala — May 12, 2015 @ 12:09 am BST May 12,2015 | Reply

    • Only one item missing from your wish list – low-cost licences.

      The complications of how to read a row “from the end” keep mounting up when you start thinking about all the non-trivial cases – forget rows with 1,000 columns and the odd “smallest bit first” strategy, there’s probably enough complication introduced by the fact that Oracle doesn’t store trailing null columns

      Comment by Jonathan Lewis — May 18, 2015 @ 6:47 pm BST May 18,2015 | Reply

  4. Hi Jonathan,
    I guess that Oracle is using №1 approach. In-Memory provides additional functionality, and plan reflects this new behavior by inmemory predicate in addition to a filter:
    Without In-Memory, I think, we can try to check hypothesis by creating a table with chaining rows and evaluating ‘table fetch continued row’ statistics after the tests. Additional reads should be reflected there in case if Oracle walks the rows again.

    Comment by Vyacheslav Rasskazov — May 12, 2015 @ 4:23 am BST May 12,2015 | Reply

    • The in-memory columnar store (and HCC compression in Exadata) make it worthwhile storing an array of pointers to the start of each column so that you can jump directly to just the columns in the WHERE clause first – but it’s worthwhile because each pointer points to an array of values for that column, not to just one value for one row. So we’re looking at a completely different problem there.

      The “table fetched continued rows” is an interesting idea – tempered only by two thoughts – first that Oracle might pin the primary and “chained” block as it visits them so we might not be able to tell exactly what it did with them; secondly (unfortunately) that Oracle doesn’t appear to be terribly consistent about how it reports continued row accesses:

      Comment by Jonathan Lewis — May 18, 2015 @ 6:53 pm BST May 18,2015 | Reply

  5. I get the feeling this is a John 6:6 situation: “But this He said to test him, for He Himself knew what He would do.” :)

    It seems like it would be very useful to make the transfer of a particular column as slow as possible. Maybe a large VARCHAR2 for columns 200-250? Or maybe all of them?

    While walking the chain of columns within a row, it doesn’t actually need to memcpy() the bytes out of the block in the buffer cache into the PGA yet–it could just keep a data structure with offsets in the row (or the row fragment). But temporarily storing both an offset and the dba would likely be longer than the actual data itself much of the time. So maybe it’s adaptive and just grabs the data right then if it’s short enough, and adds a flag that it’s done so. But with suitably large dummy VARCHAR2 data, these kept offsets (if they exist) would be sure to kick in.

    But your challenge to us is to come up with a test for these hypotheses. The performance would have to vary based on the contents of the row(s). I don’t think there’s anything else that we have control over that could alter the performance (underscore parameters and the like… if we knew of any then it seems we’d already know the answer to begin with).

    Well, maybe that’s not true… build an index on column 255 and hint the query to use it or FTS, and compare timings? That would sidestep the need to evaluate column 255 for the predicate during the row processing. If we can satisfactorally eliminate the time to read and process the index block… Hmm, that might just be going deeper down the rabbit hole.

    So instead of the index, maybe doing a timing of reading the row(s) by rowid would yield something. To be fair, probably need two queries with AND ROWID IN (/*subquery or list of ROWIDs*/) and one of them adds the predicate on column 255, which is always satisfied in the rows we supplied ROWIDS for.

    So it seems I’ve got a lot of ideas but none of them especially useful yet. :-|

    Comment by Jason Bucata (@jb31842) — May 12, 2015 @ 5:27 pm BST May 12,2015 | Reply

    • Jason,
      just one addition: with large VARCHAR2 columns we would probably get some row chaining effects – and I guess this would change the whole picture.

      Comment by Martin Preiss — May 13, 2015 @ 10:57 am BST May 13,2015 | Reply

      1. I get the feeling this is a John 6:6 situation: “But this He said to test him, for He Himself knew what He would do.” :)

      Coincidentally it was only in the last couple of weeks that my followers reached the 5,000 mark!

      I think that either a few large varchar2() columns might be the only way in which we could get a better idea of what happens – though perhaps one of the “debug / break / symtab” enthusiasts could just dump subroutine calls and search libraries. The aim would be to get Oracle to do the work of getting to the end of a long row to test a predicate and then seeing whether we could spot any difference in CPU or memory usage depending on whether the test succeeds or fails. It’s possible that a large number of very small columns might help, though, or a very large number of rows where we (in principle) do, or don’t need the columns.

      I like the idea of accumulating a running list of the column positions rather than the column values. That seems to be a good compromise between doing something twice or doing it unnecessarily. But that might be even harder to prove.

      Comment by Jonathan Lewis — May 18, 2015 @ 7:07 pm BST May 18,2015 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by