Oracle Scratchpad

November 13, 2006

Distributed Oddity

Filed under: distributed,Infrastructure,Troubleshooting — Jonathan Lewis @ 9:55 pm GMT Nov 13,2006

An interesting side-effect of blogging (rather than writing up a more formal document for my web site) is that it feels perfectly reasonable to throw out some odd observations without necessarily having to supply an explanation or solution.

So here’s the bare bones of an oddity relating to distributed queries that someone emailed me a couple of days ago:

INST1 -  Insert data into table X and commit 
INST2 -  select from X@DB1,dual;     

INST2 does not necessarily see the data from INST1, although if you wait at least three seconds, then it will see the data.

Note particularly that the query from INST2 joins two tables; if you don’t do a join then INST2 will see the data.
If you repeat the query INST2 will see the data on the second execution
If you issue a (redundant) commit or rollback before querying then INST2 will see the data.
If you wait about 60ms before querying INST2 may see the data.
If you wait for 3 seconds then INST2 will see the data.

The three seconds example makes me think it is related to re-synching SCNs between distributed databases – but which process is responsible for doing this ?  But the fact that the failure only happens with a join seems very odd.

I passed this oddity to the Oak Table to see if they had any comments – apart from reproducing the general case, a couple of them came back with a couple of associated phenomena.

One Oakie came up with an example where the two databases were running under different operating systems – one HP-UX, the other AIX – the problem simply did not appear.

Another Oakie came up with the idea of truncating the table on the first instance – and it still took the second instance some time before it noticed that the data had disappeared.

Maybe there’s something on Metalink, or in the latest patch releases that explains this. But there isn’t enough time to research every little oddity. But if anyone happens to have an explanation (or a bug number) for this anomaly, feel free to post it in and tell the rest of us.

[Further reading on distributed databases]


  1. Just noticed that the same happens by swapping the versions – INST1 on 10g, INST2 on 9i.

    But I think that this is unavoidable, since you can’t have perfect read consistency between two different databases, since they don’t share a common “time reference frame” – by the time the statement reaches the remote database, the remote time/SCN has changed (speed of light being finite …).

    It is simply not possible to map the SCNs on the two databases, unless they both freeze their scn, exchange their currect values, and then unfreeze the SCNs – which is simply not scalable; this test case is the perfect counterexample that this does not happen.

    So after the Schrodinger effect, we have an Einstein effect as well :)

    Comment by Alberto Dell'Era — November 13, 2006 @ 10:53 pm GMT Nov 13,2006 | Reply

  2. Just thinking aloud here..
    Looks like a similar issue to
    Bug 611416
    which has been accepted as a limitation.

    Comment by Fairlie Rego — November 14, 2006 @ 1:55 am GMT Nov 14,2006 | Reply

  3. Fairlie, thanks for that note. I think that is exactly the issue; and the comments (despite being 1999 and 7.3.4) are still apt.

    Alberto, I agree with the general premise – but you don’t have to freeze SCNs, you only have to move the lower SCN to at least the value of the higher SCN – which could be done in one round trip from the caller. This is (presumably) why the commit/rollback trick works.

    Incidentally, there is an interesting underlying issue anyway. A distributed join that operates as a nested loop need not give a read-consistent result, as the repeated execution of the query that gets second rowsource may operate at a different SCN every time – unless you start with “set transaction readonly” or do something with the isolation level. So perhaps the ‘solution’ to the problem is to remind people that they need that first step anyway, as that step probably performs the required synchronisation anyway.
    (That’s also “thinking out loud” – to be tested, not trusted).

    Comment by Jonathan Lewis — November 14, 2006 @ 11:43 am GMT Nov 14,2006 | Reply

  4. “set transaction readonly” could not work due to Bug No. 3258015

    “Fixed in Product Version: No Data”

    seems to be very interesting for this argument the following metalink note/bug:

    Comment by Alessandro Deledda — November 14, 2006 @ 12:43 pm GMT Nov 14,2006 | Reply

  5. We had a developer here at work that always committed his commits twice. We laughed at him but maybe he was onto something! Maybe he ran into this “undocumented feature” but his reasoning was always “to make sure it is committed”.

    Thanks for blogging. Always something new and interesting.

    Comment by Glenn — November 14, 2006 @ 1:55 pm GMT Nov 14,2006 | Reply

  6. I once worked with an Irish developer who also committed his commits twice.

    When I asked why he replied ‘To be sure, to be sure’.

    Alternatively, is this an example of ‘two-phase commit’ ?

    Comment by Andy C — November 14, 2006 @ 8:40 pm GMT Nov 14,2006 | Reply

  7. you don’t have to freeze SCNs

    Say the local instance receives a message from the remote one:
    “my current SCN is 123”
    actually the meaning is
    “my current SCN was 123, n milliseconds ago”
    where “n milliseconds” is the network latency, which is unknown [but even if it were known, it wouldn’t add information – since the remote SCN when the message is received can be anything between 123 and 123+(some thousands)].

    Moving the lower SCN to the value of the higher SCN is just (thinking out loud as well) just a clever trick to avoid maintaining a remote-local SCN map; it can’t cure the fundamental (physical) problem outlined above.

    Without freezing SCNs (stopping time on the two physical reference frames) – I can’t imagine a way to synchronize perfectly the SCNs on the two instances. So, perfect distributed read consistency cannot be achieved (again, without freezing, which is something very unscalable and prone to disaster if the network goes down just after freezing).

    Comment by Alberto Dell'Era — November 14, 2006 @ 9:57 pm GMT Nov 14,2006 | Reply

  8. Alberto, I agree – perfect synchronization is not possible; but on a superficial examination moving to the higher SCN would appear to avoid the original problem.

    I start a query at SCN 500, and move to the remote site – if your SCN is 510 I return to the local site, jump the SCN and restart my query. If your SCN is 490 you jump your SCN and I continue.

    “Superficial” because of the complexities of high concurrency, queries originating at both ends simultaneously, three-node queries and network latency issues. The strategy of synchronising at the end seems to be a sensible compromise between closest match and least contentious.

    Comment by Jonathan Lewis — November 15, 2006 @ 7:58 am GMT Nov 15,2006 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by