Fragmentation 2

July 16, 2010

Fragmentation 2

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 6:51 pm BST Jul 16,2010

This note is part two of a four-part series, and covers Disk and Tablespace fragmentation. The whole series is as follows

Introduction – with links to parts 2 – 4

Disk and Tablespace Fragmentation – this bit

Table Fragmentation

Index Fragmentation

2.1 Disk “fragmentation”.

Tablespaces are made up of files, and files are stored on discs – which are often “logical volumes” rather than real devices. When you issue a (real) disc read request, the largest amount of data you can get off a (real, physical) disc in a single physical action is something like 300KB to 500KB – the content of a single circular track on a single platter of a disc.

For a larger read you continue the read by switching to a different platter (electronic switch), switching to a different track (radial head movement), or switching to a different disc – which means joining another disk queue (in which case your SAN software – or equivalent – will probably have worked out which discs you were going to be using and started all the reads queueing concurrently).

When you create an Oracle datafile you simply do not know how scattered that file might be across the physical discs in the system – at best a 1MB read from a single file may turn into three or four rotations of a single disk with electronic platter switches, at worst I have seen a single read request turn into 32 separate disc events because of the multiple layers of software that had been used to stripe across discs, then stripe across disk groups, then stripe across logical volume groups and so on . (This can be really good for performance if you are the only user on your SAN, but a queueing disaster on a multi-user system; and this is why it has become a good strategy to present “naked” discs to ASM and have only one layer of Oracle-aware software between Oracle and the discs).

Maxim: don’t put too many clever layers of software between Oracle and the disc drives.

2.2 Tablespace “fragmentation”.

Obviously you can create a tablespace using multiple files. By definition the tablespace is fragmented – even though there is nothing inherently wrong with this particular type of fragmentation. As I pointed in the previous note, though, this has side effects on the placement of extents for a data segment and this can lead to cases where you might want to perform big reads and find that you are actually performing smaller reads than expected with an associated increase in I/O queue time.

The “fragmentation” effect that people are more like to have in mind, though, when they talk about tablespace fragmentation is something that has also been called “honey-combing” and “bubbling”. This is a side effect of dropping or shrinking objects, moving tables, or rebuilding indexes. You end up with chunks of free space scattered throughout the tablespace. Each time you rebuild another object, of course, you will probably refill some of these chunks – leaving a load more holes where the object used to be.

Basically this type of fragmentation is unlikely to matter since you’re not doing any extra work because of the empty space – except when you backup the files. If you really think that the time spent backing up the empty space is having an important impact on your backup times (which may mean that the backup over-runs into the next dataload cycle, for example), then you might want to take some action to move objects around in a way that leaves all the empty space at the end of the files as this will allow you to reduce the file size. (See, for example, this note on shrinking tablespaces.) Bear in mind, though, that moving objects around can introduce undesirable side effects – there was a sad little note on OTN a couple of years ago where a DBA found that moving tables made them take up more space (I wrote a note in response, quoting the reason, and solution, that I had originally published in Practical Oracle 8i).

The historic difficulties associated with “honeycombing” were largely a side effect of the pctincrease parameter that Oracle allowed you to specify for data segments – helped along by the myth that it was a good idea to keep reducing your objects to a single extent each. Since the introduction of locally managed tablespaces with their strictly limited set of options for extent sizing (especially the option for uniform sizing), the only issue is timing – when are you going to reuse the space, not how can you manage to reuse the space.

[Further reading on Fragmentation]

Comments (13)

13 Comments »

How “naked” is “naked”?

In the ASM guide it recommends using hardware RAID where available.

http://download.oracle.com/docs/cd/E11882_01/server.112/e10500/asmprepare.htm#BABJHHEC

So by “naked” do you mean a mirrored pair of disk, or literally an individual disk, with ASM providing the redundancy?

Just curious to hear what you prefer from a performance perspective.

Cheers

Tim…

Comment by Tim Hall — July 16, 2010 @ 7:08 pm BST Jul 16,2010 | Reply
- Tim,
  
  I think “naked” left too much to the imagination. Hardware mirroring is going to be more efficient than software mirroring – so I’d let the SAN handle mirrors; but I would avoid striping around SAN disks (or hypers, or whatever the equivalent terminology) if I could and I wouldn’t use what EMC call “concatenation”.
  
  The target is to avoid having multiple layers of software which are trying to do the same type of thing and end up damaging each other’s ability to function well. (See also the comment below about EVA).
  
  I see that Alex Fatkulin has an interesting article on what you expose and how you expose it to ASM, though, looking at a different aspect of how to use your hardware: http://afatkulin.blogspot.com/2010/07/asm-mirroring-and-disk-partnership.html
  
  Comment by Jonathan Lewis — July 24, 2010 @ 11:06 am BST Jul 24,2010 | Reply
Yes, i have never thought it a good idea to present individual spindles from your SAN all the way through to all of your RAC nodes, to me that seems like a waste of what the SAN’s controller was built for.

Comment by Curtis Ruck — July 16, 2010 @ 7:31 pm BST Jul 16,2010 | Reply
Hi, JL

Would you please do some setting to display the whole post in the RSS feed instead of excerpt. Sometimes *.wordpress.com site are blocked in China mainland, but we can read your wonderful article from google reader all the time if the whole content in RSS feed.

Best regards,

Kamus

Comment by Kamus — July 17, 2010 @ 2:59 pm BST Jul 17,2010 | Reply
- Kamus,
  
  I’ve changed the option for feeds from “summary” to “full text”. Originally I set it to summary because I found that when everyone had the full text option enabled it made it slow work getting through blog aggregators to find the useful stuff. I hadn’t considered the possibility that the feed option was the only option for some people.
  
  If I get enough complaints, though, I may switch back to summary.
  
  Comment by Jonathan Lewis — July 24, 2010 @ 11:10 am BST Jul 24,2010 | Reply
Hi Jonathan,

this whitepaper (http://h71028.www7.hp.com/enterprise/downloads/4AA0-9728ENW_ASM_EVA_Best_Practices_White_Paper_final%20121806.pdf) from hp regarding striping for asm shows that according to their test Both ASM and EVA striping(configution 1) was the best performing config. This seem to differ from your note above

Comment by josh — July 18, 2010 @ 5:42 pm BST Jul 18,2010 | Reply
- Josh, I thnk Jonathan mentioned the double striping is great for performance provided ASM is the only user of the underlying disks, what tends to happen though is a SAN admin assigns luns of a predetermined size(50GB and 100GB seem to be favourites) made up of a stripe across X disks so for example with 146GB disks if you have 10 mirrored disks in a raid set you get 14*100GB luns or 28*50GB luns what then happens is oracle gets some of that storage some may be used for file storage, Mailstore /other. at that point you run into trouble as the exchange server backup coincides with the production batch load and everyone starts blaming the Slow database.
  Its important to work with the SAN and storage guys on this as you need not only a number of GB of storage but a certain Quality of service, (number of IOPs is useful) so may need more disks than you think to sustain the service times.
  
  Comment by Chris_c — July 19, 2010 @ 9:33 am BST Jul 19,2010 | Reply
- Josh,
  
  Two interesting points about that paper:
  
  1) There was no test for “Only ASM striping” without “EVA striping”.
  
  2) The most significant performance benefit came from increasing the ASM allocation unit size (a hidden parameter in that version of Oracle) from 1MB to 8MB – i.e. reducing the interference due to the double-striping effect.
  
  Comment by Jonathan Lewis — July 24, 2010 @ 11:14 am BST Jul 24,2010 | Reply
[…] 1-What is tablespace and disk level fragmentation? Jonathan Lewis-Fragmentation-2 […]

Pingback by Blogroll Report 16/07/2010 – 23/07/2010 « Coskan’s Approach to Oracle — July 23, 2010 @ 3:17 pm BST Jul 23,2010 | Reply
Hi.

Thanks for the clarification Jonathan.

There are lots of conflicting messages in the whitepapers listed on the ASM section of OTN, including the double-striping issue and some setups that are just plain dodgy from an availability perspective. Not exactly a consistent message if someone is new to this subject. :)

Cheers

Tim…

Comment by Tim Hall — July 24, 2010 @ 12:37 pm BST Jul 24,2010 | Reply
[…] Disk and Tablespace Fragmentation […]

Pingback by Fragmentation 1 « Oracle Scratchpad — December 6, 2010 @ 11:03 pm GMT Dec 6,2010 | Reply
We are noticing high wait event for “DB File Sequential READ” on ASM/RAC environment with 10 MS wait time to find the data in Data file. Also Table is partitioned. Platform: AIX. DB version: 10.2.0.4

Any thoughts on this?

Thanks,
~ Jayesh Patel

Comment by Jayesh — May 5, 2011 @ 8:13 pm BST May 5,2011 | Reply
- Jayesh,
  
  10 ms isn’t a disaster (on average) – it’s what you might expect from a reasonably heavily loaded set of disc drives. However, with modern discs and the typical SAN cache in front of it I would hope to see an average of 6ms or less.
  
  I’d start by taking a look at the v$event_histogram to get a better idea of what that average was hiding. Working as an outsider I’d also do a quick check on the size of your SGA and check whether you were suffering some side effects of Oracle shrink the db cache to grow the shared pool. Since you’re running RAC I’d also do a quick check to see if you had a lot of “fusion writes”.
  
  As far as tuning goes – the most likely issue is simply lack of precision in indexing, so you’re reading more disk blocks than you need for data; so I’d take a look at the “SQL ordered by reads” section of an AWR or Statspack report to check for guilty SQL, and “Segments by reads” to check for guilty objects. When doing so I’d keep in mind the total number of reads compared to the numbers reported in the sections in case a lot of the reads had disappeared in flushed SQL and objects..
  
  Comment by Jonathan Lewis — May 7, 2011 @ 8:16 am BST May 7,2011 | Reply