Oracle Scratchpad

June 1, 2010

I wish .. (1)

Filed under: Oracle,SQL Server,Wishlist — Jonathan Lewis @ 7:45 pm BST Jun 1,2010

I pointed out some time ago a few of the things in SQL Server that I would like to see in Oracle. Here’s a couple more:

Executions Plans

Insert, update, and delete execution plans show the indexed access paths used to check the side effects of referential integrity constraints. (I haven’t  checked to see what happens if  you define a foreign key as  ‘on delete cascade’ so I don’t know if SQL Server follows all the way down through the  delete tree – but I wouldn’t be surprised if it did.)

SQL Server also tells you about indexes that have to be modified even if your DML doesn’t require any referential integrity checks. Here’s one line extracted from a plan that demonstrates the point. I have a statement that is updating a column in the transactions table but, as you can see, two other objects are referenced at the same time – the two indexes that have to be updated:

|--Table Update(OBJECT:([testdata].[dbo].[transactions]), 
	SET:([testdata].[dbo].[transactions].[error] = raiseIfNullUpdate([Expr1004]))) 

If nothing else, this will remind people that when they insert a row there may be a lot more work going on than just finding a block with a bit of free space somewhere in the table.


This one is more “nice to have” than “extremely useful”, and might not be worth the return on investment to the Oracle developers, but SQL Server has a lovely little trick for handling concurrent tablescans.

If a process starts to do a tablescan and a second process starts to scan the same table a little while later the second (and Nth) process will deliberately synchronise with the first process by starting its tablescan at the point that the first process has just reached – allowing both processes to take advantage of the same block reads at the same time.

When the processes get to the end of the table, the second process will then go back to the start of the table and read the blocks it originally skipped. The benefit is that you don’t have several processes doing physical multiblock reads in different parts of the table at the same time and causing unnecessary contention and queues for disks as they do so. (Of course, in many cases in Oracle, the second process may benefit from visiting some blocks that were buffered by the first process – so you often see the second process “catch up” the first process and then spend its time waiting on wait event “read by other session” (one of the “buffer busy waits” class of wait events in earlier versions of Oracle) – which has the same effect as SQL Server, but it’s not by design and it’s not guaranteed.

This little trick has earned the name “merry-go-round” tablescan, and you can understand why if you imagine several processes all starting to scan the same table one after the other – you could cause a continuous cyclic scan to take place as one process drops out and another takes over.

The reason why I rate this as “nice to have” is that (a) you wouldn’t normally expect to be doing many concurrent tablescans like this in Oracle and (b) if you were you’d probably be doing it with parallel queries in a data warehouse anyway – which might make the merry-go-round very hard to implement safely.


  1. My immediate thoughts on reading about the “merry-go-round” tablescan is that I have recently seen code on a live Oracel DW system that would probably benefit from this – but only because the same, inefficient query gets run in quick succession, scanning a whole partitioned table {where a little effort on the coding side would give partition exclusion}.

    Then I asked myself the “but what happens when…” question. You mention parallel being a complication but so would RAC {if different nodes are executing the same query}, and if Exadata is in the frame then it would need to take that into account too.

    On the other hand, if you had the data on readonly tablespaces, then it should simplify things (no need to be concerned with read-consistency if all data is from RO tablespaces)…

    I wonder if this is the sort of thing the core RDBMS developers in Oracle argue about all the time, whether it is worth programing for these special cases.

    After all, in Oracle if you select min(ID) or max(ID) from a table with an index on ID, the CBO does a fast min/max scan on the index. But select min(ID),Max(ID) from the table, it does a full index scan. That seemed a very obvious improvement to code for to me but it was not in 10. (Not sure about 11 but I am willing to bet a small amount of money it still does a fast full index scan).

    Comment by mwidlake — June 1, 2010 @ 9:24 pm BST Jun 1,2010 | Reply

  2. There is one thing in SQL Server that Oracle needs to implement sooner rather than later: the ability to have one set of redo logs for each of the “databases”.
    In Oracle that would translate to having multiple sets of redo logs and assign logins/schemas/whatever-Oracle-calls-them-now to each set. Very much like the current assigning of users to specific temporary tablespaces.
    Otherwise the concept of consolidation of databases and running multiple applications in a single database – essential for cloud computing and “db-as-a-service” – goes out the window.
    There are a few other things needed as well, but this one is fundamental.

    Comment by Noons — June 1, 2010 @ 9:33 pm BST Jun 1,2010 | Reply

  3. In regard to Noons’ comment, an obvious problem would be if a table is created (or even just extended) in one schema, that is reflected in the database’s system tablespace. With one system tablespace, you have one redo log.

    I think this works for SQL Server because they have the concept of a ‘database’ at a level higher than a schema, but below Oracle’s concept of a database. Sort of like a service but at a physical level.

    If Oracle wanted to go this route, I’d expect something more like an instance manager that co-ordinates multiple instances on the same server, dynamically balancing resources between them.

    Comment by Gary — June 1, 2010 @ 10:57 pm BST Jun 1,2010 | Reply

  4. Having multiple instances totally defeats the purpose of consolidation: saving server resources.

    There is absolutely no conflict whatsoever with system tablespace and multiple redo log sets: the data in SYSTEM belongs to SYS and SYSTEM and those can have a default set of redo logs for their purposes.

    If a schema creates a table, that goes in the system redo logs.
    If the same schema loads data in that table, that goes in the schema’s redo log.
    If the loading of data causes a change in system metadata – an additional extent, for example – then the metadata change is written to system redo log: it is a change to a SYS table, not a schema table.

    Each to its own, and a set of metadata to say what is where.

    Hence if that schema now needs a recovery, SYS redo log metadata knows it’s in schema’s redo log number #whatever. And of course this redo metadata is protected by the system’s default redo log.

    SQLServer implements this really well and flexibly. But it fails for example in not providing for an easy way to implement multiple temp “databases”, which basically means no multiple temp tablespaces…

    Ah well: no one is perfect. :)

    Comment by Noons — June 2, 2010 @ 2:54 am BST Jun 2,2010 | Reply

    • Noons,

      SQL Server has a sys account for every database which records object definitions and space allocation for the database, and this does leave an important area that needs some careful review if you want one instance to handle multiple databases. (And that, basically, is probably all you want – one instance with a single shared global, but one lgwr and one dbwr per database with a facility for rationing major memory allocations per database).

      One thing that worries me about SQL Server and its multiple databases is that you can have a single transaction that is allowed to update multiple databases or, at a less threatening level, update one database based on a query from another – so that a simple recover of one database may still leave you with inconsistent databases if you can’t recover to “present”.

      Comment by Jonathan Lewis — June 2, 2010 @ 7:29 am BST Jun 2,2010 | Reply

      • You got it in one. I’d skip the multiple dbwriter gladly. It’s the single set of redo logs that worries me: consolidating two heavy redo log apps into one instance will kill Oracle, while SS remains more or less unaffected – assuming sufficiently separate disk and within obvious reasonable limits; I am *not* suggesting dumping amazon and ebay in one instance!
        Yes yes, I know: “redo in SSDs fixes all that”. Right…

        But I’d like to see also more flexible control of resource allocation: IME, the resource limiter in Oracle is not the best of “schedulers”. It does the job but not with sufficient “fine grain” control.
        Then again: neither is SS much better at the apportioning of server power and memory to each db, although it has solved the redo problem.

        On the updating of two databases: that is not just SS, surely. If I do a distributed update in Oracle, I am up to the same problem with a single instance recovery. This is where human intervention and reasoning would be required, nothing wrong with that.

        We run >10 “databases” in every single one of our SS instances and of course we have to be careful to more or less isolate them. You are absolutely right: the risk is there for this to happen. Managing security becomes critical in our environment, to avoid any risk of a login having too much power to create havoc.

        Comment by Noons — June 2, 2010 @ 2:16 pm BST Jun 2,2010 | Reply

        • How do you avoid the risk of programmer-controlled locking strategies making redo contention look like a walk in the park?

          Comment by joel garry — June 3, 2010 @ 10:17 pm BST Jun 3,2010

        • @Garry:
          very simply, I don’t let it happen.
          Programmers trying to access data cross-dbs or cross-schemas can only succeed if they are authorized to do so. That’s why I said the security becomes critical in these environments: relax it and you are in for a nasty surprise.
          SS has extensive control of security at both database and server level. Oracle allows this: “alter session set current_schema=xxxx” in a login trigger, just as an example. And a few other things, as of course you are aware. So rather than excluding dbs/schemas by segregating into instances, I exclude them by security. Works very well indeed.

          Comment by Noons — June 4, 2010 @ 1:45 am BST Jun 4,2010

        • Noons,

          It is interesting to note, though, that some of the very large databases that use SQL Server deal with the scalability problem by partitioning their main tables by using separate databases for separate partitions – so you get, for example, one database per day and have to write code that queries across databases.

          Comment by Jonathan Lewis — June 4, 2010 @ 8:40 am BST Jun 4,2010

      • I’ve played with both SQL Server and MySQL quite a bit and to me, MySQL uses a similar model to SQL Server. You have one set of processes managing multiple databases. I haven’t peaked lower to see what it does at the log level. I guess I probably should now that MySQL is part of the Oracle family.

        Comment by Tom — June 2, 2010 @ 3:41 pm BST Jun 2,2010 | Reply

  5. “…Of course, in many cases in Oracle, the second process may benefit from visiting some blocks that were buffered by the first process – so you often see the second process “catch up” the first process and then spend its time waiting on wait event “read by other session” (one of the “buffer busy waits” class of wait events in earlier versions of Oracle) – which has the same effect as SQL Server, but it’s not by design and it’s not guaranteed. ”

    I quite often noticed this – watching V$SESSION_LONGOPS for a specific interval – that two sessions, which start a Table Scan shortly after another, this “catching up” you describe here.
    Observing SOFAR (or MESSAGE) for a while, almost *always* the both long operations come to sync exactly (the second has less blocks read at the beginning, after a while the second session shows exactly the same number of blocks read sofar), one of the sessions (might also be the first which started) mostly spending on wai event “read by other session”

    So I always assumed: this *is* the design to handle that.

    Comment by Sokrates — June 2, 2010 @ 8:23 am BST Jun 2,2010 | Reply

  6. Random thoughts on the tablescan issue …

    1. It could be more complex than this where the queries have different consistency point, of course. The two queries may have to access different undo blocks to get a data consistent with their own requirement. Not an insurmountable problem though.

    2. I think I’m right that in 11g a non-parallel query could be using serial reads as well, so it wouldn’t just be parallel queries in DW environments that need a different treatment.

    3. Where I have seen multiple simultaneous tablescans or the same segment(s) the usual symptom has been a rise in “waiting for buffer” events (I forget which one off the top of my head, and I think they changed in 10g?) because the “trailing” process catches up with the process that is performing the disk access because the logical reads are much faster than the physical reads. I usually regard such waits as not-a-problem.

    4. This seems like it would be _more_ critical in a parallel query DW environment (or a serial-direct-read environment) because unless there is caching outside the database …
    4a. the physical reads of a trailing process(es) make it much less likely to catch up with the leading process. Maybe Exadata has a mechanism that helps with this, albeit indirectly … aren’t some parallel queries put temporarily on hold when the system is busy?
    4b. a trailing query _generally_ cannot benefit from the work done by the leading query because blocks _generally_ go to PGA, not into the shared buffer.

    Comment by David Aldridge — June 2, 2010 @ 8:52 am BST Jun 2,2010 | Reply

    • “…It could be more complex than this where the queries have different consistency point, of course….”

      Does SQL Server support read consistency, by the way ?
      (I came only in touch with SQL Server several years ago in one project and therefore have no knowledge at all about it, so Jonathan, I highly appreciate your writing on your studies about it therefore)

      Comment by Sokrates — June 2, 2010 @ 9:08 am BST Jun 2,2010 | Reply

      • They added Consistency in 2005 but it wasn’t the default. I tried it back then and I found the performance to be atrocious. Could be one of the reasons why they neglected to turn it on. Another could be the fact that many apps are written against SQL Server with that in mind and changing the way the DB works could have a drastic affect on those applications. I haven’t tried 2008 yet, guess I probably could (time permitting).

        Comment by Tom — June 2, 2010 @ 3:44 pm BST Jun 2,2010 | Reply

  7. Oh, I just thought of another thing about the tablescan — that Exadata has the curious disadvantage over non-Exadata when it comes to the “shareability” of query results because of column filtering and row filtering, which would have to be compatible between the leading and the trailing queries.

    Comment by David Aldridge — June 2, 2010 @ 8:58 am BST Jun 2,2010 | Reply

  8. Hi,

    And why not separate dictionary data about “database”. So, that things are self contained, independent. To have transportable “database” (including code, sequences etc). Availability – you don’t care if SYSTEM tablespace of “another database” is corrupted.

    Restore/recovery on “database” level. As consequence simplified / faster procedure for PITR.

    Of course, risk is to have costly dictionary queries, security concerns…

    Just my 2cents.


    Comment by Igor — June 2, 2010 @ 5:53 pm BST Jun 2,2010 | Reply

  9. IMHO … Oracle should implement the feature “Filtered Indexes” of SQL Server …

    Comment by Parthiban Nagarajan — June 3, 2010 @ 12:33 pm BST Jun 3,2010 | Reply

  10. I thought I read somewhere, maybe 6 months ago, that Oracle 11g and/or Exadata has merry-go-round tablescans.

    Of course, I can’t find it now…

    I know for sure I’ve heard of that concept before–and I never pay attention to SQL Server news so it couldn’t have been from there.

    Comment by Jason Bucata — June 5, 2010 @ 9:35 pm BST Jun 5,2010 | Reply

  11. […] under: SQL Server — Jonathan Lewis @ 6:13 am UTC Jun 18,2010 Continuing my little list of things in SQL Server that I’d like to see in […]

    Pingback by I wish (2) « Oracle Scratchpad — August 22, 2010 @ 2:23 pm BST Aug 22,2010 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by