[summary] What does my Kernel do?

From: Tobias Oetiker <oetiker_at_ee.ethz.ch>
Date: Tue Jan 13 2004 - 02:43:46 EST
Summary: How to figure what my Solaris Kernel does

Usual Suspects
* It is serving NFS ... this can use a lot of CPU. Make sure you
  are running version 3.

* A fast (Gigabit) interface can almost fill a cpu if it is busy

* It is swapping. If the kernel runs out of memory it will spend most of its
  time moving pages back and forth between disk and ram.

  - run "vmstat 5" the sr (scan rate) column should be very low (<100) this
    means the system is not scanning for free memory pages

  - It may make sense to have a lot of swap space configured, as Solaris
    does conservative memory allocation. When a process forks it
    will immediately allocate all the memory necessary even though
    it does not use it.  Solaris does "copy on write" so why not
    have this extra memory allocated in swap instead of real ram,
    assuming it is never going to be used anyway. (correct me if I
    am wrong here.)

* It is forking ... this does not have to be a real fork bomb, but just some
  process quitting and being restarted immediately. Pidentd running
  non multi-threaded may be such a software.  Some cgi process could
  also be it.  This is detectable by looking at the 'last process
  id' with a tool like top.

* It is running veritas volume manager and a disk has failed.

Useful Tools

* lockstat

  lockstat -gkIW sleep 60

  gives a 60 second profile of the kernel

* iftop


  will show which box is sending how much traffic through your interface

* se toolkit


  virtual adrian may be able to give some hints onto where the performance
  issues lie

* prstat

  prstat -m

  will show user vs system time for each process, so if it is a process
  causing the problem it should show here

* truss

  truss -c -p PID

  can help to identify which system calls a problematic process is spending
  its time on. A summary is printerd on ctrl-c

* iostat

  iostat -xnP 30 30

  shows where the system is writing and reading data and how much

* vmstat

  vmstat 5

  shows paging activity (check the sr column)

* kstat

  Displays kernel statistics. Did not get any useful hints on what could be
  discovered here ... but sure gives a lot of numbers

* prex

  prex -k

  Part of the solaris tracing architecture. Note, that this will just open
  a shell where you are expected to enter commands to activate the tracing. I got
  the following example ... (reading the output is another issue)

  # prex -k                                 1)
  Type "help" for help ...
  prex> buffer alloc 10m                    2)
  Buffer of size 10485760 bytes allocated
  prex> enable $all                         3)
  prex> trace $all                          4)
  prex> ktrace on                           5)
  ... wait a bit ...
  prex> ktrace off
  prex> untrace $all
  prex> disable $all
  prex> quit
  # tnfxtract ./tnf.result                  6)
  # prex -k
  Type "help" for help ...
  prex> buffer dealloc                      7)
  prex> quit
  # tnfdump ./tnf.result                    8)

  1) Issue prex command with kernel trace mode
  2) You should allocate kernel in-core buffer to trace kernel activity.
  3) Enable trace set named $all. You can specify your own trace facility
     (tnf_name) set. (ie. all I/O operation) Refer prex man page.
  4) Trace $all set.
  5) Start kernel trace. Immediately kernel starts to collect tnf_probe and
     store it kernel in-core buffer.
  6) Extract contents of kernel buffer to file system.
  7) Deallocate kernel in-core buffer. You should extract contents of buffer
     before deallocate buffer. Contents of buffer will be erased immediately
     when you issue "deallocate"
  8) Convert raw tnf data to readable ASCII format.

Reading List

Sun Performance and Tuning: Java and Internet, 2nd Edition (Adrian Cockcroft)

Unlocking the kernel

Performance and Tuning on the Solaris 2.6, 7, and 8


Markus Kluge, Ramiro Santos, Allen Wooden, przemol, Casper Dik, Jon Andrews,
Thomas 'Mike' Michlmayr, Amiel Lee Yee, William Hathaway, Jeff Vaneek, Frank Smith,
Darren Dunham, Jon Andrews, Darren Dunham, Luc I. Suryo, Joe Fletcher, Mark Pfeiffer,
Joohyun Cha, Karl Vogel, Todd M. Wilkinson.
Yesterday Tobias Oetiker wrote:

> Folks,
> We have this 4 Way Sun Enterprise 420R server. With 4GB Ram and
> about 10GB swap. It runs a ton of services (Apache, Postfix,
> Amavis, Spamassassin) and it also acts as a NFS server.
> Lately we are experiencing performance issues ... the box goes to
> load 17 and responds rather sluggishly.
> When looking at the load we often see the following picture:
> 50% User
> 50% Kernel
> 0% Idle
> The 50% User is easy to attribute by looking at the processes. But
> what is the system doing in the 50% kernel time?
> Is there something like kernel-top? I played around with lockstat
> a bit, but it did not really answer my questions ...
> We are running Solaris 8.
> cheers
> tobi

 ______    __   _
/_  __/_  / /  (_) Oetiker @ ISG.EE, ETZ J97, ETH, CH-8092 Zurich
 / // _ \/ _ \/ /  System Manager, Time Lord, Coder, Designer, Coach
/_/ \.__/_.__/_/   http://people.ee.ethz.ch/~oetiker   +41(0)1-632-5286
sunmanagers mailing list
Received on Tue Jan 13 02:43:39 2004

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:27 EST