Oracle Scratchpad

April 4, 2013

Delphix Overview

Filed under: Delphix — Jonathan Lewis @ 9:04 pm BST Apr 4,2013

Update: Here’s the link to the recording of the webinar

I’ll be online tomorrow morning (Friday 5th, 9:00 Pacific time, 5:00 pm UK) in a webinar with Kyle Hailey to talk about my first impressions of Delphix, so I thought I’d write up a few notes beforehand.

I’ve actually installed a complete working environment on my laptop to model a production setup. This means I’ve got three virtual machines running under VMWare: my “production” machine (running Oracle 11.2.0.2 on OEL 5, 64-bit), a “development” machine (which has the 11.2.0.2 software installed, again on OEL 5, 64-bit), and a machine which I specified as Open Solaris 10, 64-bit for the Delphix server VM (pre-release bloggers’ version). The two Linux servers are running with 2.5GB of RAM, the Delphix server is running with 8GB RAM, and all three machines are running 2 virtual CPUs. (My laptop has an Intel quad core i7, running two threads per CPU, 16GB RAM, and 2 drives of 500GB each.) The Linux machines were simply clones of another virtual machine I previously prepared and the purpose of the exercise was simply to see how easy it would be to “wheel in” a Delphix server and stick it in the middle. The answer is: “pretty simple”. (At some stage I’ll be writing up a few notes about some of the experiments I’ve done on that setup.)

To get things working I had to create a couple of UNIX accounts for a “delphix” user on the Linux machines, install some software, give a few O/S privileges to the user (mainly to allow it to read and write a couple of Oracle directories), and a few Oracle privileges. The required Oracle privileges vary slightly with the version of Oracle and your prefered method of operation, but basically the delphix user needs to be able to run rman, execute a couple of Oracle packages, and query some of the dynamic performance views. I didn’t have any difficulty with the setup, and didn’t see any threats in the privileges that I had to give to the delphix user. The last step was simply to configure the Delphix server to give it some information about the Linux machines and accounts that it was going to have access to.

The key features about the Delphix server are that it uses a custom file system (DxFS, which is based on ZFS with a number of extensions and enhancements) and it exposes files to client machines through NFS; and there are two major components to the software that make the whole Delphix package very clever.

Oracle-related mechanisms

At the Oracle level, the Delphix server sends calls to the production database server to take rman backups (initially a full backup, then incremental backups “from SCN”); between backup requests it also pulls the archived redo logs from the production server – or can even be configured to copy the latest entries from the online redo logs a few seconds after they’ve been written (which is one of the reasons for requiring privileges to query some of the dynamic performance views, but the feature does depend on the Oracle version).

If you want to make a copy of the database available, you can use the GUI interface on the Delphix server to pick a target machine, invent a SID, and Service name, and pick an SCN (or approximate timetamp) that you want to database to start from, and within a few minutes the Delphix server will have combined all the necessary backup pieces, applied any relevant redo, and configured your target machine to start up an instance that can use the (NFS-mounted) database that now exists on the Delphix server. I’ll explain in a little while why this is a lot cleverer than a simple rman “restore and recover”.

DxFS

Supporting the Oracle-related features, the other key component of the Delphix server is the Delphix file-system (DxFS). I wrote a little note a few days ago to describe how Oracle can handle “partial” updates to LOB values – the LOB exists in chunks with an index on (lob_id, chunk_number) that allows you to pick the right chunks in order. When you update a chunk in the LOB Oracle doesn’t really update the chunk, it creates a new chunk and modifies the index to point at it. If another session has a query running that should see the old chunk, though, Oracle can read the index “as at SCN” (i.e. it creates a read consistent copy of the required index blocks) and the read-consistent index will automatically be pointing at the correct version of the LOB chunk. DxFS does the same sort of thing – when a user “modifies” a file system block DxFS doesn’t overwrite the original copy, it writes a new copy to wherever there’s some free space and maintains some “indexing” metadata that tells it where all the pieces are. But if you never tell the file system to release the old block you can ask to see the file as at a previous point in time at no extra cost!

But DxFs is even cleverer than that because (in a strange imitation of the “many worlds” interpretation of quantum theory) a single file can have many different futures. Different users can be identified as working in different “contexts” and the context is part of the metadata describing the location of blocks that belong to the file. Imagine we have a file with 10 blocks sitting on DxFs - in your context you modify blocks 1,2 and 3 but at the same time I modify blocks 1,2 and 3 in my context. Under DxFS there are now 16 blocks associated with that file – the original 10, your three modified blocks and my three modified blocks and, depending on timestamp and context, someone else could ask to see any one of three different versions of that file – the original version, your version, or my version.

Now think of that in an Oracle context. If we copy an entire set of database files onto DxFS, then NFS-mount the files on a machine with Oracle installed, we can configure and start up an instance to use those files. At the same time we could NFS-mount the files on another machine, configuring and starting another instance to use the same data files at the same time! Any blocks changed by the first instance would be written to disc as private copies, any blocks changed by the second instance would be written to discs as private copies – if both instances managed to change 1% of the data in the course of the day then DxFs would end up holding 102% of the starting volume of data: the original datafiles plus the two sets changed blocks – but each instance would think it was the sole user of its version of the files.

There’s another nice (database-oriented) feature to Delphix, though. The file system has built-in compression that operates at the “block” level. You can specify what you mean by the block size (and for many Oracle sites that would be 8KB) and the file system would transparently apply a data compression algorithm on that block boundary. So when the database writer writes an 8KB block to disc, the actual disc space used might be significantly less than 8KB, perhaps by a factor of 2 to 3. So in my previous example, not only could you get two test databases for the space of 1 and a bit – you might get two test databases for the space of 40% or less of the original database.

Delphix vs. rman

I suggested earlier on that Delphix can be a lot clever than an rman restore and recover. If you take a full backup to Delphix on Sunday, and a daily incremental backup (let’s preted that’s 1% of the database per day) for the week, then Delphix can superimpose each incremental onto the full backup as it arrives. So on Monday we construct the equivalent of a full Monday backup, on Tuesday we construct the equivalent of a full Tuesday backup, and so on. But since DxFS keeps all the old copies of blocks this means two things that we can point an instance at a full backup for ANY day of the week simply by passing a suitable “timestamp” to DxFs – and we’ve 7 full backups for the space of 107% of a single full backup.

There are lots more things to say, but I think they will have to wait for tomorrow’s conversation with Kyle, and for a couple more articles.

Register of Interests / Disclosure

Delphix Corp. paid my consultancy rates and expenses for a visit to the office in Menlo Park to review their product.

11 Comments »

  1. [...] Join Jonathan Lewis and myself for a discussion and sharing of first impressions of Delphix. Jonathan worked closely with the Delphix team investigating performance, comparing technologies, and testing community-requested use cases. In this chat, Jonathan  and I will have an informal discussion on the preliminary results of Jonathan’s  first-hand experiences with Delphix, working closely with our team to learn about the functional aims, implementation strategies, and underlying technologies. As part of his hands-on activity, Jonathan started with the simple provisioning benefits of Delphix, and went on to look at the performance implications of various scenarios, including test cases suggested by the Oracle community. For a glimpse into what will be discussed  see Jonathan’s blog post on his visit to Delphix. [...]

    Pingback by DB Optimizer » Join Jonathan Lewis for a discussion of Delphix — April 4, 2013 @ 10:04 pm BST Apr 4,2013 | Reply

  2. Hello,

    Thanks a lot, I was eagerly expecting the info about Delphix.
    By the way, will the webinar be recorded?
    Cheers

    Pablo

    Comment by Pablo — April 5, 2013 @ 7:26 am BST Apr 5,2013 | Reply

  3. I read up on ZFS when it was first released by Sun, and had a number of significant reservations about using it for a production database system (“copy on write” leads to too many random reads and writes across the disk in my view). But for a test environment it might offer some useful features, which it seems Delphix has taken advantage of.

    What about the online redo logs? Won’t they just be continually replicated as they are written to, and so you end up with a massive history of all copies of all previous blocks in the online redo logs?

    What about other side effects, such as updating the same row in the same block multiple times over a period of time, and assuming the Database Writer flushes out the block between each separate update? With Delphix you would now end up with a separate copy of that data block for each write by the Database Writer, as it never overwrites an existing copy on disk. I think this breaks your argument about updating 1% of the data only needs 1% more disk storage. It is true only if that 1% of the data is only ever updated once in any period of time, but not true I think if it is updated multiple times over an extended period of time.

    I’ve always felt that a “snapshot” technique was a good technique for taking multiple virtual copies of data files and making them shareable. You can use a bit map to track which blocks have changed, and a “copy on write” policy when any of the blocks in the copy are changed. Unchanged blocks are still read from the original source blocks, while changed blocks are written to a different area leaving the original source block unchanged. And if only a small percentage of the blocks are changed you can make significant space savings, and be able to create “copy” test environments really, really quickly as nothing is actually copied – only a bit map is created.

    Sun used to have a similar and different product for taking such a snapshot of a file system and then mounting it under a different mount point (this was some years before ZFS). It allowed multiple virtual copies of the same data set, but shared the original data via a controlling bit map. As I said, any changes to a copy were saved locally and the bit map for that copy updated. Unfortunately there were other issues with the product, and like a great many other products Sun just gave up on it and abandoned it.

    ZFS does not use bit maps I believe – it always does a “copy on write” regardless. All writes of disk blocks are always to a new physical block location on disk, leaving all old blocks still on disk with their original contents. And as you say, this gives the potential to examine the contents of disk blocks as they were at some previous point in time. At some point of course you will end up having used all available blocks on the disk, so I assume ZFS starts overwriting the oldest disk blocks and reusing them.

    John

    Comment by John Brady — April 5, 2013 @ 8:45 am BST Apr 5,2013 | Reply

    • John,

      Thanks for the comments – you raised some points which pretty much mirrored my first thoughts about how this type of technology had to work and where the “threat points” might be.

      I htinkit’s reasonable to say that the product Delphix is offering is not going to let you do a direct comparison of (say) the performance of last night’s batch run on the production machine with a re-run on the equivalent virtual database stored on the Delphix server. The scattering imposed on tablescans as blocks are changed is probably the most obvious source of timing differences. In one of our online events I will, no doubt, say something about the type of testing and comparison that I think would be valid.

      Your point about a small volume of the database being changed extremely frequently, leading to a long trail of database blocks and a long trail of redo blocks is an important point. A point I didn’t make in the blog, though, was that DxFS allows you to control the way your historical trail of private copies can be released.

      For virtual databases, the system is automatically configured so that older versions (apart from the original copy) of blocks are immediately freed for reuse – this addresses the potential threat of constantly changing data blocks, and the threat of extreme volumes of redo. The total volume of online redo never exceeds the original volume (although it will be wandering around the discs), and the total volume of “extra” data blocks is only the number of blocks that have ever been changed in your virtual database. (If you start to take snapshots of virtual database, and create virtual database branches off virtual databases, this behaviour does change.)

      I think one of the complications of the original bitmap strategy was that it was very difficult to treat it as a “rolling” process – you took a snapshot, and kept it for a while, but eventually you threw it away and took a new snapshot. With the DxFS setup it is possible to discard the oldest versions of blocks (provided no virtual databases are pointing at them) to free up space – this means that you never have to generate a new full backup after the first one. I understand that the internal coding to do this is quite subtle and sophisticated, though, deciding that nothing depends on a particular block could otherwise be a quite expensive process.

      Comment by Jonathan Lewis — April 5, 2013 @ 3:32 pm BST Apr 5,2013 | Reply

  4. .arf file cannot be played directly on Linux. I solved reading this CISCO kb (Article ID: WBX52416 How Do I Convert an .ARF to .MP4 on Linux?): https://support.webex.com/MyAccountWeb/knowledgeBase.do?root=Tools&parent=Knowledge

    Regards,
    Marco V.

    Comment by Marco V. — April 10, 2013 @ 9:29 am BST Apr 10,2013 | Reply

  5. I also like what I see from a storage savings and rapid deployment perspective. Just one question on how the new Oracle 12c DBCLONE ‘thin provisioning’ compares with Delphix – from what I see in the Oracle docs it looks like the same functionality.

    Comment by Margaret M — July 31, 2013 @ 10:25 pm BST Jul 31,2013 | Reply

    • Margaret,

      The notes I read with the release said something about this feature being planned where the underlying O/S (e.g. ZFS) would allow the appropriate type of snapshot, but that it was not yet available. This struck me as being a fairly typical example of Oracle trying to spoil the playing field for another company. If and when it happens it will probably be very similar in basic behaviour to Delphix – but (a) I would guess that Oracle probably has a long way to go to get to a product and (b) the thin provisioning is only available within the same container database on the same hardware as the production system whereas Delphix is engineered to move the clones to any other machine you have handy for the initial backup.

      Comment by Jonathan Lewis — August 4, 2013 @ 5:42 pm BST Aug 4,2013 | Reply

  6. I have been using delphix 2.7.5 version for close to an year in our environment. I liked the features, storage savings, snapshot technology. GUI was impressive and it reduced the refresh process. The few hardship that I faced with delphix are:

    1) When delphix storage utilization reached 85 % , the database started reporting log file sync wait event. Even though delphix mentioned that the storage threshold is set to 95%. Once I removed some virtual databases (vdb). It was better.
    2) With 3.1 delphix version, they have restricted tons of database parameters, I see that only parameters delphix say can be customized are archive log destination, state and audit dest.
    (http://docs.delphix.com/display/DOCS31/Customizing+Oracle+VDB+Configuration+Settings)

    I am surprised by how delphix is restricting Oracle init parameters, it didnt make sense to me and have requested for more details on what is meant by “restricted”. Also spfile and pfile is no longer kept in $ORACLE_HOME/network/admin. Delphix maintains it and stores it in delphix storage.

    Have anyone else used delphix 3.1 ? Is there any other competitor with similar technology?

    Comment by Cherrish Vaidiyan — August 7, 2013 @ 10:09 pm BST Aug 7,2013 | Reply

  7. Nice article – that answered my question – how does VDB share file systems from dbsource.
    Thanks Jonathan

    Comment by atul deshpande — August 16, 2013 @ 10:15 pm BST Aug 16,2013 | Reply

  8. […] 8TB database was added into Delphix as a dSource, which triggers a backup into the Delphix disk area. This is the only full backup that it ever had to take–it will forevermore use incremental […]

    Pingback by The High Price of Data :: Oracle Alchemist — March 7, 2014 @ 7:19 pm BST Mar 7,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,514 other followers