255 columns

February 19, 2015

255 columns

Filed under: Infrastructure,Oracle,Performance — Jonathan Lewis @ 12:45 am GMT Feb 19,2015

You all know that having more than 255 columns in a table is a Bad Thing ™ – and surprisingly you don’t even have to get to 255 to hit the first bad thing about wide tables. If you’ve ever wondered what sorts of problems you can have, here are a few:

If you’re still running 10g and gather stats on a table with more than roughly 165 columns then the query Oracle uses to collect the stats will only handle about 165 of them at a time; so you end up doing multiple (possibly sampled) tablescans to gather the stats. The reason why I can’t give you an exact figure for the number of columns is that it depends on the type and nullity of the columns – Oracle knows that some column types are fixed length (e.g. date types, char() types) and if any columns are declared not null then Oracle doesn’t have to worry about counting nulls – so for some of the table columns Oracle will be able to eliminate one or two of the expressions it normally includes in select list of the stats-gathering SQL statement – which means it can gather stats on a few more table columns. The 165-ish limit doesn’t apply in 11g – though I haven’t checked to see if there’s a larger limit before the same thing happens.
If you have more than 255 columns in a row (or, to split hairs a little more carefully, if you have used a column past the 255th) Oracle will split the row into multiple row pieces of 255 columns each plus one row piece for “the rest”; but the split counts backwards from the end(or last column used), so if you have a table with 256 columns the first row-piece has one column the second row-piece has 255 columns. This is bad news for all sorts of operations because Oracle will have to expend extra CPU chasing the the row pieces to make use of any column not in the first row piece. The optimists among you might have expected “the rest” to be in the last row piece. If you want to be reminded how bad row-chaining can get for wide tables, just have a look at an earlier blog note of mine (starting at this comment).
A particularly nasty side effect of the row split comes with direct path tablescans – and that’s what Oracle does automatically when the table is large. In many cases all the row pieces for a row will be in the same block; but they might not be, and if a continuation row-piece is in a different block Oracle will do a “db file sequential read” to read that block ~~into the buffer cache~~ and it won’t be cached (see 1st comment below). As an indication of how badly this can affect performance, the results I got at a recent client site showed “select count(col1) from wide_table” taking 10 minutes while “select count(column40) from wide_table” took 22 minutes because roughly one row in a hundred required a single block read to follow the chain.
An important side effect of the split point is that you really need to put the columns you’re going to index near the start of the table to minimise the risk of this row chaining overhead when you create or rebuild an index.
On top of everything else, of course, it takes a surprisingly large amount of extra CPU to load a large table if the rows are chained. Another client test reported 140 CPU seconds to load 5M rows of 256 columns, but only 20 CPU seconds to load 255.

If you are going to have tables with more than 255 columns, think very carefully about column order – if you can get all the columns that are almost always null at the end of the row you may get lucky and find that you never need to create a secondary row piece. A recent client had about 290 columns in one table of 16M rows, and 150 columns were null for all 16M rows – unfortunately they had a mandatory “date_inserted” column at the end of the row, but with a little column re-arrangement they eliminated row chaining and saved (more than) 150 bytes storage per row. Of course, if they have to add and back-fill a non-null column to the table they’re going to have to rebuild the table to insert the column “in the middle”, otherwise all new data will be chained and wasting 150 bytes per row, and any old data that gets updated will suffer a row migration/chain catastrophe.

Comments (31)

31 Comments »

Jonathan,

we discussed migrated rows / rows with more thab one row piece in the past. I have several notes

– the very bad thing about chained rows and direct reads thad is that finding the rest of row by “db file sequential read” is never cached. It was you who pointed out this fact in previous discussion. I have reported this behaviour in SR on MOS, the analyst confirmed this behaviour by telling me that direct reads change the context on all subsequent reads of current segment not to go to buffer cache even for random reads

– I’m convinced that having more than one row piece per row wouldn’t have to be so bad as it is today. Why do we see so many spit row pieces in different rows when it belong to same row? when segment is re-formated by alter table move (or ctas, whatever), according to all observations it seems when we are not lucky enough, we get only on row with row pieces split into different block (if we get even number of row pieces per block, there is no “split row”. This could be only 1% or less and the impact on perfomance negligible. Update seems to be the trouble. Why doesn’t oracle move all row pieces belonging to same row when row migration kicks in? Of course, the performance of UPDATE would be slighltly hurt, but the performance of select wouldn’t suffer that much

– from the same SR oracle analyst stated “having more than 255 columns in table is not performance improving technique”. That’s right, unfortuntely it is not written directly in manual an even experienced developers are not awared of this fact

– there is one more bad thing regarding rows with two (or more) row pieces, again learned from your blog. If we have row with two row pieces, which fit into same block, oracle during read (which need columns from both pieces) still doubles the number of consistent gets. This seems to be still true (at least for 11.2.0.4 didn’t try on 12c). Confirmed also by event 10200.

Regards
Pavol

Comment by Pavol Babel — February 19, 2015 @ 1:49 am GMT Feb 19,2015 | Reply
- Pavol,
  
  >>”That’s right, unfortuntely it is not written directly in manual”
  
  btw, about year ago i gave to our Siebel developers this note: Siebel: Recommended number of columns in a table (Doc ID 480411.1) https://support.oracle.com/epmos/faces/DocContentDisplay?id=480411.1
  
  Best regards,
  Sayan Malakshinov
  
  Comment by Sayan Malakshinov — February 19, 2015 @ 7:53 am GMT Feb 19,2015 | Reply
- Pavol,
  
  I’d forgotten that we’d had that discussion: https://jonathanlewis.wordpress.com/2014/02/10/row-migration/ thanks for the reminder. The code ought to be much better, but it probably needs more (exadata) users to raise the issues.
  
  Comment by Jonathan Lewis — February 20, 2015 @ 6:05 am GMT Feb 20,2015 | Reply
Good thing is that at least one bug is fixed in 11.2.0.4: Bug 9373758 Optimization to handle insert of >255 columns.

Comment by Vyacheslav Rasskazov — February 19, 2015 @ 9:05 pm GMT Feb 19,2015 | Reply
- Vyacheslav,
  
  Interesting to note in the SR the erroneous claim (paraphrased) “… shouldn’t happen because we’re selecting from the first 255 columns.”. The informal description above (probably, assuming I made no errors) translates into: “The problem appears if you’re selecting anything after the first mod(column_count,255) columns”.
  
  Comment by Jonathan Lewis — February 20, 2015 @ 6:10 am GMT Feb 20,2015 | Reply
  - Can loading fat table into the cache say KEEP OR RECYCLE help in 11.2.0.4?
    
    Comment by Jacob — July 27, 2017 @ 8:57 pm BST Jul 27,2017 | Reply
    - Jacob,
      
      There is no obvious reason why assigning the table to the KEEP or RECYCLE cache should have any special effects, but I haven’t done any tests to find out. As a general guideline I hardly ever bother thinking about using a KEEP cache, and I only think about using a RECYCLE cache for objects that could cause cache interference problems for other sessions when it’s being accessed.
      
      Comment by Jonathan Lewis — July 28, 2017 @ 8:51 am BST Jul 28,2017
    - Hi Jonathan,
      The table I currently dealing with has 507 columns. Since oracle is using direct path read I wondered if we will get result in the hope that all IO will result in consistent get…rather than having data from this table being loaded in to PGA.
      
      Thanks
      Jacob
      
      Comment by Jacob — July 28, 2017 @ 9:14 am BST Jul 28,2017
    - Jacob,
      
      If you are doing direct path reads I wouldn’t expect to see any improvements from assigning the table to the KEEP or RECYCLE caches.
      
      If you blocked Oracle’s ability to do direct path reads then a RECYCLE (or KEEP, if this was the only table in it) cache might be helpful – it might keep the blocks holding chained pieces in the cache just long enough to avoid some of the re-reads that appear in the worst cases. This would depend, of course, on the size of the cache and the pattern to the processing that resulted in rows being split across multiple blocks. The benefit would appear for the cases where (e.g.) block A have several rows that had all chained to block B – with direct path reads you would re-read block B several times, with cached reads you would only need to read block B once and then revisit it in the cache for the rest of the chained rows in block A.
      
      Comment by Jonathan Lewis — July 28, 2017 @ 9:35 am BST Jul 28,2017
    - Some additional stats…
      Block size 8K
      507 column average row 20K
      The proposal is to define a seperate CACHE say recycle and load the entire 40GB table to MEM.
      
      No other object will get loaded and we expect the fat table to be fixed in MEM.
      
      Will this scenario hold?
      
      Thanks
      Jacob
      
      Comment by Jacob — July 29, 2017 @ 12:13 pm BST Jul 29,2017
    - Forgot to mention that the table will be joined using HASH
      
      Comment by Jacob — July 29, 2017 @ 12:27 pm BST Jul 29,2017
    - Jacob,
      
      Using the KEEP or RECYCLE cache to keep 100% of a table permanently cached may be a lot harder than it looks at first sight. How big does the cache need to be if the table is 40G ? Just 40G is not going to be enough. Consider what happens when someone is modifying data and you have a requirement for read-consistent blocks, you need to create read-consistent blocks – will they be created in the cache specified for the table, or in another cache – what affect will excessive numbers of read-consistent clones have on the size of the cache you need and on the “other” cache where the rest of the data is. In principle if you created a RECYCLE cache that was about 6 times the size of the table you might be safe (and you might find that Oracle, in the newer versions, will do in-memory parallel query without having any hidden parameters set. In practice you may find that having a RECYCLE cache just twice the size of the table is sufficient.
      
      If you want to do some testing, make sure you do it with a realistic level of data modification going on and check what happens with read-consistent clones and parallel query – you may find that even when fully cached parallel query forces itself into direct path reads.
      
      You might like to read this (rather old) posting about some oddities I noted with the KEEP and RECYCLE cache – but note especially the comments at the end about how some things had changed in slightly newer versions of Oracle.
      
      Comment by Jonathan Lewis — August 3, 2017 @ 5:54 pm BST Aug 3,2017
    - Jonathan,
      We did a test biffing up the CACHE with enough MEM to hold the table however not enough to take into account the weekly DML and read consistency implications.
      In parallel we also use the appropriate hint to have force the join orderand aimed for Full scan using direct IO.
      
      The result was marginally better in favour of the CACHE.
      It goes back to what you already mentioned, that it is unlikely that CACHE access will be more efficient than direct IO.
      
      Thanks you for your valuable reply.
      Jacob
      
      Comment by Jacob — August 3, 2017 @ 6:50 pm BST Aug 3,2017
[…] Jonathan’s Link […]

Pingback by Table with 255 columns or more… | Real Life Database / SQL Experiences : An Oracle Blog from Vivek Sharma — March 4, 2015 @ 3:10 pm GMT Mar 4,2015 | Reply
[…] his researches he had read my article on wide rows, and had picked out of one of the comments the line: “the very bad thing about chained rows […]

Pingback by Migrated rows | Oracle Scratchpad — May 18, 2015 @ 6:44 pm BST May 18,2015 | Reply
Hi Jonathan,

This subject of row splitting interests me very much as our database supports a Siebel app that
has many tables with more than 255 columns. We’ve gone to great effort to order the columns in
our tables with the most frequently accessed columns first and the columns with all null values
at the end.

Our goal is to minimize traversing columns and row pieces, to save space and minimize cpu. I’ve
viewed the order of walking through the columns for a select column as piece1(col1, col2, col3 …col255),
piece2(col256, col257 … colnnn). You mentioned that the rows are split starting from the end.
How does that affect the ordering we’ve done? Do I need to reorder some other way to accomplish what
I’ve described?

In light of how Oracle splits rows I feel apprehensive about doing much table restructuring based on observed,
but undocumented behavior only. I’d be more confident if the row splitting logic was documented as expected
behavior or with at least a Metalink note. What are you thoughts on that as well?

Thanks,

Richard Brieck

Comment by Richard Brieck — April 11, 2016 @ 10:19 pm BST Apr 11,2016 | Reply
- Richard,
  
  This is one of those areas where it’s good to know what can happen (even though, as far as I can tell there’s no official documentation – and even if you find a note on Metalink (MoS) that confirms my comments that – apparently – wouldn’t be considered official documentation). Internal mechanised that have not been documented can be changed, of course, and that’s probably why Oracle Corp. chooses to avoid publishing many details about them. (In fact, in this case, I’ve found at least one note on MoS that has got it wrong, with a casual comment about the row splitting starting from the beginning.)
  
  Given the lack of an official statement and the level of tedious effort required to realise the potential benefits, this is an example of a case where I would take advantage of the information ONLY IF I had identified a specific table where there was a clear and significant benefit to be had if I could work out how to restructure the table correctly.
  
  The basic pattern of “most accessed first, nulls last” (and there’s the first conflict of interest – frequently accessed nulls should still appear early) is sound, but when considering each specific table you also have to be aware of the potential side effects of the way in which the row might be updated, and the extra (potentially extreme) chaining that might appear as a consequence (see the earlier blog linked in above).
  
  Comment by Jonathan Lewis — April 15, 2016 @ 10:04 am BST Apr 15,2016 | Reply
Jonathan,

Thanks for your reply. As I mentioned earlier I’ve viewed the order of walking through the columns for a selected column as piece1(col1, col2, col3 …col255),
piece2(col256, col257 … colnnn). Would you please walk me through a row of say 300 all populated columns as to what piece is accessed first and second? What are the columns in those pieces and what order are they traversed to access column e.g. column 299? Sorry I’m having some “Who’s on first?” confusion with how this happens in reality. It seems that I would need to know this before attempting any reordering of columns.

Thanks,

Richard

Comment by rbrieck — April 15, 2016 @ 4:32 pm BST Apr 15,2016 | Reply
- Richard,
  
  That’s not a question with a simple generic answer, unfortunately – but I can give you a couple of examples, and raise a couple of questions:
  
  Example 1: If we assume that you insert the whole 300 column row in one insert then columns 1 to 45 would be in the first row-piece and columns 46 to 300 would be in the second row-piece. If you then selected column 299 (e.g. by a table access by rowid) Oracle would access the first row-piece first, which would tell it that the row-piece was the first row-piece of a chain and held only 45 columns, so (I assume) it would immediately access the second row-piece and then walk the columns until it got to the 254th column of that row-piece (which which be the 299th column of the row).
  
  Example 2: If you inserted a row with only 200 of the 300 columns then Oracle would insert just one row-piece of 200 columns. If you then updated that row to add columns 291 to 300 Oracle would (probably) end up with the row chained into a second block, with (again) columns 1 to 45 in the first row piece and columns 46 to 300 in the second row piece in the second block. The access would be the same as example 1, but now you’d have to jump to a new block when accessing the second row piece.
  
  There may be more complicated effects if the update that extended the row also resulted in a space limitation problem in the first block (e.g. you updated columns 1 to 10 to be much longer so that they were the only columns that could fit in the first block – I haven’t done any experiments to see what Oracle would do in this type of scenario, though.
  
  Comment by Jonathan Lewis — April 20, 2016 @ 4:39 pm BST Apr 20,2016 | Reply
  - Jonathan,
    
    Thanks, that clears up a lot for me. Regarding Example 1, what would happen with the addition of a 301st column (assuming all 301 have values)? Would both pieces need to be rebuilt with piece 1 now having columns 1 through 46 and piece 2 having columns 47 through 301? If so Oracle choosing to split rows in reverse seems to have created a lot of extra work compared to splitting first column to last.
    
    Richard
    
    Comment by rbrieck — April 20, 2016 @ 8:04 pm BST Apr 20,2016 | Reply
    - Richard,
      
      It’s very hard to predict these things, though I suspected it might do something like the disaster I described in https://jonathanlewis.wordpress.com/2012/03/30/quiz-night-17 , and that’s what happened in the test that I’ve just created (though the fact that the table I used was completely empty may have had some impact).
      
      I defined a table with 301 columns in 11.2.0.4 (not yet tested in 12.1.0.2)
      
      I inserted one row, populating only columns 1 and 200 – this gave me a complete row in block 129 (of file 7),
      
      I updated the row, populating column 300 – this gave me a first row piece of 45 columns in block 129 and a second row piece of 255 columns in block 130.
      
      I updated the row again, populating column 301 – this gave me a first row piece of 45 columns in block 129, a second row piece of ONE column in block 130 and a third row piece of 255 columns in block 130.
      
      Comment by Jonathan Lewis — April 20, 2016 @ 8:32 pm BST Apr 20,2016
[…] assumed anyway, but I thought I’d publish a bit of the work I had done so that you can see another of the funny effects that appear when your table definition has too many columns (and you use […]

Pingback by 255 columns | Oracle Scratchpad — May 19, 2017 @ 5:49 pm BST May 19,2017 | Reply
[…] mix certain features that majority of customers are not using, in this case in-memory column store, tables with >255 columns, and NCHAR columns, then you can start to see some ‘interesting’ […]

Pingback by In-Memory Column Store and tables with more than 255 columns | jolliffe.hk — October 3, 2017 @ 11:57 am BST Oct 3,2017 | Reply
[…] rows have been split across multiple blocks then one of the nastier performance problems appears with direct path read tablescans. As Oracle follows the pointer to a secondary row piece it will do a physical read of the target […]

Pingback by Troubleshooting | Oracle Scratchpad — August 12, 2019 @ 5:02 pm BST Aug 12,2019 | Reply
Hi Jonathan,
I was recently investigating a performance issue related to this so I did some tests (on 19.12) and I observed few interesting things:
1) when we create a table with more than 255 columns (and load it), the row piece with the 255 rows is the last, but when we do a reorg of the segment (alter table move ..), the 255 row piece is the first (if we have 511 columns the two 255 will be first) – this can be very important if we add more columns to the table
2) if we add a set of columns to the table, the set goes to the last row piece, if the last row piece is full (255 cols), a new row piece is created: The number of the columns we are adding to the full row piece is extracted from the beginning of the full row piece to a separate row piece, and the columns being added are appended to the end of the last row piece. I was able to create several hundred single column row pieces by adding one column to the table and setting some values for the column.
3)This is well known: the column is physically added to the blocks when its value is set, in that case all columns preceding this column, are added to the table as well (if they were not there already), even when they are null , this happens for those rows where the value was set to a non-NULL value.
4) from 1), 2), and 3), if we have a table with more than 255 columns, and the table was not reorganized, we should add new rows in bulk to avoid single column row pieces..
we can use alter table add (col1.. ), alter table add (col2.. ), … but the following update should update all columns if possible..or at least the last one (well, but as you wrote, mostly null columns should go to the end of the table)..
if we add column like this:
alter table t add c1 …;
update t set c1 = ..
a single column row piece will be created, for each row where the value is set
if we do a reorg of the table (probably the best time is just after initial loading), the first div(x,255) row pieces will contain 255 columns and the last row piece will contain mod(x, 255) columns, where x is number of columns, so if we add a column, it will be added to last row piece, and no new row piece is created (if the mod(x, 255) != 0).
5) We can have “back and forth” chaining. I was able to create a row which had a row piece r1 in a block b1, the row piece r1 referenced a row piece r2 in a block b2, the row piece r2 referenced back a row piece r3 in the block b1 and the row piece r3 referenced a row piece r4 in the block b2, so reading this row consumed 3 additional single block reads..

Comment by Radoslav Golian — January 24, 2022 @ 9:19 pm GMT Jan 24,2022 | Reply
- Hi Radoslav,
  
  I get different picture… Could you show your test cases and dumps, please?
  
  my few notes on it:
  http://orasql.org/2017/06/17/intra-block-row-chaining-optimization-in-12-2/
  http://orasql.org/2017/02/12/intra-block-row-chaining/
  and slide #9 here: http://orasql.org/wp-content/uploads/2017/04/Intra-block-row-chainingRuOUG.pdf
  
  Comment by Sayan Malakshinov — January 24, 2022 @ 10:10 pm GMT Jan 24,2022 | Reply
  - Sayan,
    
    Thanks for the links.
    I’ve added them to my “255 columns” catalogue.
    
    Regards
    Jonathan Lewis
    
    Comment by Jonathan Lewis — January 25, 2022 @ 1:01 pm GMT Jan 25,2022 | Reply
- Radoslav,
  
  Thanks for taking the time to comment.
  The list of points you’ve made prompted me to create a little catalogue of all the articles I’ve written on the topic of tables with more than 255 columns: https://jonathanlewis.wordpress.com/2022/01/25/255-column-catalogue/ (and do the same for other categories of articles).
  
  1) I’ve described at: https://jonathanlewis.wordpress.com/2017/05/19/255-columns-2/, it does depend how you “rebuild” the table.
  2) This was the basis for an older articles about “how many row-pieces”: https://jonathanlewis.wordpress.com/2012/03/30/quiz-night-17/
  3) Did get a mention at the end of this article
  
  4) That’s a strategy worth mentioning explicitly – it’s the sort of conclusion that could be missed even if all the preceding details had been observed.
  
  5) That’s interesting; that wouldn’t have been possible in earlier versions of Oracle thanks to the way that blocks were marked as FULL if a row-piece was written to a second block. Presumably the code to deal with the space management for chained rows has been modified to allow this new anomaly to appear.
  
  Regards
  Jonathan Lewis
  
  Comment by Jonathan Lewis — January 25, 2022 @ 12:51 pm GMT Jan 25,2022 | Reply
Well, I did a mistake :) I played mostly with the updates..

After a segment move, the new rows are formatted in “original way” – 225 row piece last.. and the new rows goes to a new block, even though there is a space in a block with the “reorg” formatting

So the reorg might make sense before adding new columns to a wide table..

Comment by Radoslav Golian — January 25, 2022 @ 5:44 am GMT Jan 25,2022 | Reply
- Radoslav,
  
  Thanks for the follow-up.
  
  Axiom of Oracle investigations: “there’s always one more experiment”
  
  Regards
  Jonathan Lewis
  
  Comment by Jonathan Lewis — January 25, 2022 @ 1:02 pm GMT Jan 25,2022 | Reply
[…] 255 columns (Feb 2015) – some side effects (with extras in the comments) […]

Pingback by 255 column catalogue | Oracle Scratchpad — January 25, 2022 @ 12:17 pm GMT Jan 25,2022 | Reply