Oracle Scratchpad

October 13, 2014

Memory

Filed under: Uncategorized — Jonathan Lewis @ 5:24 pm BST Oct 13,2014

On a client site recently, experimenting with a T5-2 – fortunately a test system – we decided to restart an instance with a larger SGA. It had been 100GB, but with 1TB of memory and 256 threads (2 sockets, 16 cores per socket, 8 threads per core) it seemed reasonable to crank this up to 400GB for the work we wanted to do.

It took about 15 minutes for the instance to start; worse, it took 10 to 15 seconds for a command-line call to SQL*Plus on the server to get a response; worse still, if I ran a simple “ps -ef” to check what processes were running the output started to appear almost instantly but stopped after about 3 lines and hung for about 10 to 15 seconds before continuing. The fact that the first process name to show after the “hang” was one of the Oracle background processes was a bit of hint, though.

Using truss on both the SQL*Plus call and on the ps call, I found that almost all the elapsed time was spent in a call to shmatt (shared memory attach); a quick check with “ipcs – ma” told me (as you might guess) that the chunk of shared memory identified by truss was one of the chunks allocated to Oracle’s SGA. Using pmap on the pmon process to take a closer look at the memory I found that it consisted of a few hundred pages sized at 256MB and a lot of pages sized at 8KB; this was a little strange since the alert log had told me that the instance was allocating about 1,600 memory pages of 256MB (i.e. 400GB) and 3 pages of 8KB – clearly a case of good intentions failing.

It wasn’t obvious what my next steps should be – so I bounced the case off the Oak Table … and got the advice to reboot the machine. (What! – it’s not my Windows PC, it’s a T5-2.) The suggestion worked: the instance came up in a few seconds, with shared memory consisting of a few 2GB pages, a fair number of 256MB pages, and a few pages of other sizes (including 8KB, 64KB and 2MB).

There was a little more to the advice than just rebooting, of course; and there was an explanation that fitted the circumstances. The machine was using ZFS and, in the absence of a set limit, the file system cache had at one point managed to acquire 896 GB of memory. In fact when we first tried to start the instance at with a 400GB SGA Oracle couldn’t start up at all until the system administrator had reduced the filesystem cache and freed up most of the memory; even then so much of the memory had been allocated originally in 8KB pages that Oracle had made a complete mess of building a 400GB memory map.

I hadn’t passed all these details to the Oak Table but the justification for the suggested course of action (which came from Tanel Poder) sounded like a perfect description of what had been happening up to that point. In total his advice was:

  • limit the ZFS ARC cache (with two possible strategies suggested)
  • use sga_target instead of memory_target (to avoid a similar problem on memory resize operations)
  • start the instance immediately after the reboot

Maxim: Sometimes the solution you produce after careful analysis of the facts looks exactly like the solution you produce when you can’t think of anything else to do.

July 2, 2014

Comparisons

Filed under: Uncategorized — Jonathan Lewis @ 5:09 pm BST Jul 2,2014

Catching up (still) from the Trivadis CBO days, here’s a little detail which had never crossed my mind before.


where   (col1, col2) < (const1, const2)

This isn’t a legal construct in Oracle SQL, even though it’s legal in other dialects of SQL. The logic is simple (allowing for the usual anomaly with NULL): the predicate should evaluate to true if (col1 < const1), or if (col1 = const1 and col2 < const2). The thought that popped into my mind when Markus Winand showed a slide with this predicate on it – and then pointed out that equality was the only option that Oracle allowed for multi-column operators – was that, despite not enabling the syntax, Oracle does implement the mechanism.

If you’re struggling to think where, it’s in multi-column range partitioning: (value1, value2) belongs in the partition with high value (k1, k2) if (value1 < k1) or if (value1 = k1 and value2 < k2).

June 25, 2014

AWR thoughts

Filed under: Uncategorized — Jonathan Lewis @ 3:35 pm BST Jun 25,2014

It’s been a week since my last posting – so I thought I’d better contribute something to the community before my name gets lost in the mists of time.

I don’t have an article ready for publication, but some extracts from an AWR report appeared on the OTN database forum a few days ago, and I’ve made a few comments on what we’ve been given so far (with a warning that I might not have time to follow up on any further feedback). I tried to write my comments in a way that modelled the way I scanned (or would have scanned) through the reporting – noting things that caught my attention, listing some of the guesses and assumptions I made as I went along.  I hope it gives some indication of a pattern of thinking when dealing with a previously unseen AWR report.

 

 

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,257 other followers