SUMMARY: Performance issue: VERY high kernel usage

Date: Thu Nov 17 2005 - 15:50:52 EST
I got a pretty good number of responses (and miraculously no out of office
messages) to this one. Most of the responders suggested trying tools I had
mentioned in my posts. A couple provided some great insight into why a kernel
can be slammed, but alas Solaris 9 really doesnbt have the tools to really
dig into the kernel. At least for one who is terrified of adb (panicing a
running production system is a real CLM). I was lucky enough to move some of
the load to a couple Solaris 10 systems. To date all systems are healthy
(knock on wood). Webre guessing that spreading the load to more servers has
provided a reprieve to our problem.

For a good set of Dtrace tools see  Webve started looking at
these and we are firmly convinced that Dtrace really rocks.

Great thanks to Rich Kulawiec and Darren Dunham for providing kind and
insightful information.

Original question:

1 CPU system running Resonant and apache Solaris 9 (with opportunity to go to
10) System goes nuts with 100% cpu, very high kernel usage (99% vs. user at
1%) and very high load averages (> 30) (uptime, top) Little or no I/O load
(top, iostat) Lots of memory, lots of swap free (top, vmstat) No significant
mutex locks (mpstat)

The question: Is there a tool for Solaris 9 that will tell me what process is
using so much kernel code. Or if I have the opportunity to go to Solaris 10
suggest a dtrace script that will show me the same thing.

Follow-on information
To add more information and restate the issue: I'm looking for something to
identify the specific culprit that is using the excessive kernel code (e.g.
the 'sy' column from vmstat) while the user code (e.g. the 'us' column from
vmstat) is very low.

There is LOTS of free memory (seen in vmstat and top) so memtool isn't much
use. I have run prstat, vmstat, top, mpstat, and lsof (though this doesn't
show process consumption). I'm still digging through lockstat. Running prstat
(e.g. prstat -va) shows interesting information but the numbers don't add up.
I don't see any one process really eating system mod e (SYS column)  and the
numbers don't add up as the CPU column (prstat -t) does not total the ones
from top or vmstat.
sunmanagers mailing list
Received on Thu Nov 17 15:51:17 2005

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:53 EST