SUMMARY - load averages >1, Why?

From: Dan A. Zambon (dzambon@afit.af.mil)
Date: Tue Mar 22 1994 - 01:48:30 CST


Hi Netters!

Finally, got some time to post a summary. This job would be great if
it wasn't for all the students, faculty, users.......:>)

My original post (included below) asked you fine folk for information
concerning why I was noticing a load average of > 1 on only some of my
Sparc systems. The many responses (which, as always, are greatly
appreciated) indicate the following:

Load averages above 1 are OK. Numbers above 1 are not that unusual.
Don't worry about it.

A little more detail might be helpful:
-Load averages = number of jobs in the run queue at 1, 5 and 15
 minute intervals (man pages state this, which I had already found).

-Load averages increase by (possibly)
        -many processes that use up lots of cpu resources, making
         other processes wait
        -more than one process wanting to run at the same time as
         another

-Possible causes could include suspended shells. The fix is to put the
 shell into the foreground, then suspend again.

-Vmstat 1 will parallel some of the information shown by uptime (the
 command I used to find the load averages in the first place). Vmstat 1
 will show the number of processes in run queues, as well as those blocked
 by needing a resource.

-A few asked if the performance of the system seemed to suffer. It did not.

-Some mentioned that a hay-wire process (like PC-NFS) had caused their
 load averages to go nuts.

-Some attributed the problem to bugs in 4.1.3.

-A few users mentioned a hacker problem. This is not so outlandish as we
 ourselves, just recently, found some of the systems had been invaded. One
 of the problems was that the ps command had been replaced with one that
 did not show the hacker's snooping program was running.

-A user or two mentioned unusual kernel activity, like excessive page/swap
 or network stuff.

-A process can hang if waiting for a resource. This can cause a load average
 over 1. I haven't noticed this, but it makes sense.

Take this information for what it is worth. Some of it is common sense. I
have not tried to verify all of these, but I have looked around my systems
and based on the info presented here, I am beginning to get a better understanding of what is going on.

One respondee deserves special kudos for the performance tuning script he
sent (called perf_scr, included below). Pat Cain's script is a dandy,
checking out cpu times, paging & swaping configurations, disk saturation
(not sure what that is), and things like network configuration. I have
tested this script on a few of my systems, and have learned that I have
an 8% collision rate! Maybe I didn't want to know all this.....

My thanks go to the following respondees. Ain't this newsgroup great??
olav.lerbrekk@geologi.uio.no (Olav Lerbrekk)
jv@nl.net (Johan Vromans)
Eckhard.Rueggeberg@ts.go.dlr.de (Eckhard Rueggeberg)
cciolori@tatca.tc.faa.gov (Chris Ciolorito)
ericb@telecnnct.com (Eric Burger)
Justin Keery <justin@indep.co.uk>
Robert.Wolf@dciem.dnd.ca (Robert Wolf)
jallen@nersc.gov (John Allen)
Dan Stromberg - OAC-DCS <strombrg@hydra.acs.uci.edu>
Pat Cain (Denver) <pjc@denver.ssds.com>
rjcronin@uop.com (Robert J. Cronin)
Dave Fetrow <fetrow@biostat.washington.edu>
Gautam Das <gautam@bwc.org>
"Patrick O'Callaghan" <poc@usb.ve>

*************************************************************
Now Pat's script. Please note that I do not certify that this
script is safe, and I am not responsible for any damage that it
may inflict on your poor systems. Use at your own risk.
*************************************************************
--------------------- BEGIN perf_scr INCLUSION ------------------------
#! /bin/sh

if [ `uname -s` != "SunOS" ]; then
  echo Silly rabbit! Don\'t you know trix are for SunOS\?
  exit
fi

# page out and swap out threshold values
PO_THRESH=0
SO_THRESH=0
AWK_FILE=/tmp/p.awk_$$
once=1

while getopts c c 2> /dev/null ; do
  case $c in
  c ) once=0
      ;;
  * ) echo Use: `basename $0` '[-c]'
      exit
      ;;
  esac
  shift
done

# whooo are you? ooh ooh ooh ooh
release=`/bin/uname -r`

# Set up functions and variables for each OS
case $release in
4.1.* ) PATH=/bin:/usr/bin:/usr/ucb:/usr/etc
        ECHO=/usr/5bin/echo
        psax() /bin/ps ax
        psaxc() /bin/ps axc
        awk() /bin/awk "$@"
        sol2=0
        ;;
5.* ) PATH=/bin
        ECHO=echo
        psax() /usr/bin/ps -ea
        psaxc() /usr/bin/ps -ea
        awk() /usr/bin/nawk "$@" # awk dumps core on 5.1
        sol2=1
        ;;
* ) echo Unknown release: $release
    exit ;;
esac

if [ -f core ]; then
  echo Core file must be removed from this directory before running.
  exit
fi

# no args
eat_4_lines() {
  (line ; line ; line ; line) > /dev/null
}

# no args
get_cpu_times() {
  $ECHO 'Getting cpu times...\c'
  f=/tmp/$$
  vmstat 1 10 | (
    eat_4_lines
    t_intfaults=0 # initialize counter
    t_sysfaults=0 # initialize counter
    t_cswitch=0 # initialize counter
    t_usr=0 # initialize counter
    t_sys=0 # initialize counter
    t_idle=0 # initialize counter
    for i in 1 2 3 4 5 6 7 8 ; do # for each of the rem. lines
      set -- `line` # use shell to extract
      n=`expr $# - 6` # the last arg on the line
      shift $n # which is the idle time
      t_intfaults=`expr $1 + $t_intfaults` # accumulate interrupt faults
      shift
      t_sysfaults=`expr $1 + $t_sysfaults` # accumulate syscall faults
      shift
      t_cswitch=`expr $1 + $t_cswitch` # accumulate context switches
      shift
      t_usr=`expr $1 + $t_usr` # accumulate user
      shift
      t_sys=`expr $1 + $t_sys` # accumulate system
      shift
      t_idle=`expr $1 + $t_idle` # accumulate idle
    done
    avg_intfaults=`expr $t_intfaults / 8`
    avg_sysfaults=`expr $t_sysfaults / 8`
    avg_cswitch=`expr $t_cswitch / 8`
    avg_usr=`expr $t_usr / 8`
    avg_sys=`expr $t_sys / 8`
    avg_idle=`expr $t_idle / 8`
    echo $avg_intfaults $avg_sysfaults $avg_cswitch $avg_usr $avg_sys $avg_idle
  ) > $f
  read avg_intfaults avg_sysfaults avg_cswitch avg_usr avg_sys avg_idle < $f
  /bin/rm -f $f
  echo ok
}

# no args
check_paging_swapping() {
  $ECHO 'Checking paging/swapping...\c'
  vmstat -S 1 10 | (
    eat_4_lines
    po=0
    so=0
    for i in 1 2 3 4 5 6 7 8 ; do # for each of the rem. lines
      set -- `line` # use shell to extract args
      so=`expr $so + $7` # swap out value
      po=`expr $po + $9` # page out value
    done
    if [ $po -gt $PO_THRESH -o $so -gt $SO_THRESH ]; then
      $ECHO '\n\tAdd memory.'
      $ECHO '\tRearrange process load.'
      $ECHO '\tAnalyze process behaviour.'
      $ECHO '\tUse tmpfs or mmap().'
    else
      echo ok
    fi
  )
}

# no args
generate_awk_file() {
  cat > $AWK_FILE << EOF
  BEGIN {
    go_ahead_and_debug_it = 0;
  }
  {
    lines++;
    if ( lines == 1 )
      for(i=0; i<NF; i++) {
        j = i + 1;
        disk_name[i] = \$j;
      }
    if ( lines < 4 )
      next;
    count++;
    n_disks = NF / 3;
    read_index = 1;
    write_index = 2;
    for(i=0; i<n_disks; i++) {
      rps[i] += \$read_index;
      wps[i] += \$write_index;
      rwps[i] += ( \$read_index + \$write_index );
      read_index += 3;
      write_index += 3;
    }
  }
  END {
    if (go_ahead_and_debug_it) {
      printf("n_disks is %d\n", n_disks);
      printf("rps array is ");
      for(i=0; i<n_disks; i++)
        printf("%d ", rps[i]);
      printf("\n");
      printf("wps array is ");
      for(i=0; i<n_disks; i++)
        printf("%d ", wps[i]);
      printf("\n");
      printf("rwps array is ");
      for(i=0; i<n_disks; i++)
        printf("%d ", rwps[i]);
      printf("\n");
    }
    for(i=0; i<n_disks; i++) {
      rps[i] /= count;
      wps[i] /= count;
      rwps[i] /= count;
    }
    for(i=0; i<n_disks; i++)
      for(j=i+1; j<n_disks; j++) {
        n = rwps[i] - rwps[j];
        if (n < 0) {
          if (n < -30) {
            n *= -1;
            if (rwps[i] > 0) {
              diff = n / rwps[i];
              if (diff > 0.20) {
               printf(" Disk %s has %g %% more activity than Disk %s\n", \
                 disk_name[j], diff * 100.0, disk_name[i]);
               unbalanced = 1;
              }
            } else {
              printf(" Disk %s has %d more r-w/second than Disk %s\n", \
                disk_name[j], n, disk_name[i]);
              unbalanced = 1;
            }
          }
        } else {
          if (n > 30) {
            if (rwps[j] > 0) {
              diff = n / rwps[j];
              if (diff > 0.20) {
               printf(" Disk %s has %g %% more activity than Disk %s\n", \
                 disk_name[i], diff * 100.0, disk_name[j]);
               unbalanced = 1;
              }
            } else {
              printf(" Disk %s has %d more r-w/second than Disk %s\n", \
                disk_name[i], n, disk_name[j]);
              unbalanced = 1;
            }
          }
        }
      }
    if (unbalanced)
      printf(" Unbalanced disk load. Try moving data or striping.\n");
    unbalanced = 0;
    for(i=0; i<n_disks; i++)
      if ((rps[i] >= 15) && (wps[i] >= (5 * rps[i]))) {
        printf(" Writes/sec are %g %% the reads/sec on disk %s\n", \
          (wps[i] / rps[i]) * 100.0, disk_name[i]);
        unbalanced = 1;
      }
    if (unbalanced)
      printf(" Unbalanced read/write load. Try adding PrestoServe.\n");
  }
EOF
}

# no args
check_disk_saturation() {
  echo 'Checking disk saturation...'
  generate_awk_file
  iostat -D 1 10 | awk -f $AWK_FILE
}

# no args
check_dnlc() {
  echo 'Checking DNLC hit rate...'
  if [ $sol2 -eq 0 ]; then
    # gadzooks. someone tell me how to find maxusers another way.
    set -- `(echo 'nproc?D' | adb /vmunix | ( line > /dev/null ; line))`
    maxusers=`expr '(' $2 - 10 ')' / 16`
  else
    maxusers=`fgrep maxusers /etc/system | sed 's/^.*= *//' | cut -f1`
    if [ -z "$maxusers" ]; then
      set -- `(echo 'maxusers?D' | adb /kernel/unix | (line > /dev/null;line))`
      maxusers=$2
    fi
  fi
  set -- `vmstat -s | fgrep 'total name lookups'`
  if [ $sol2 -eq 0 ]; then
    hit_rate=`echo $7 | tr -d %`
  else
    hit_rate=`echo $7 | sed 's/%)//g'`
  fi
  total_lookups=$1
  set -- $hit_rate
  if [ $1 -lt 80 ]; then
    if [ $1 -lt 0 ]; then
      $ECHO '\tOverflow on DNLC. Re-run shortly after next reboot.'
    else
      $ECHO "\tDNLC hit rate is only $1 %. Should be at least 80 %."
      if [ $maxusers -lt 64 ]; then
        more=`expr $maxusers + 8`
        if [ $more -gt 64 ]; then
          more=64
        fi
        $ECHO "\tTry increasing MAXUSERS from $maxusers to $more"
      else
        $ECHO '\tTry increasing ncsize in param.c'
      fi
    fi
  fi
  set -- `vmstat -s | fgrep toolong`
  if [ $sol2 -eq 0 ]; then
    toolong=$2
  else
    toolong=$1
  fi
  echo $toolong $total_lookups | awk '{
    n = (($1 / $2) * 100);
    if (n > 10.0) {
      printf(" Too-long pathnames are %5.2f %% of total lookups.\n", n);
      printf(" Should be no more than 10 %%.\n");
    }
  }'
}

check_cpu() {
  echo 'Checking CPU times...'
  if [ $avg_sys -gt 30 ]; then
    if [ $avg_sysfaults -gt 11000 ]; then # 30% of 33000 (peak)
      $ECHO '\tInefficient use of system calls.'
    fi
    if [ $avg_cswitch -gt 750 ]; then # 30% of 2500 (peak)
      $ECHO '\tHigh context switch rate.'
    fi
  fi
  if [ $avg_usr -gt 70 ]; then
    n_procs=`psax | awk '{
      if ( $1 == "PID" || $1 < 300 )
        next;
      n++;
    }
    END { print n }'`
    if [ $n_procs -gt $maxusers ]; then
      $ECHO '\tHigh user time w/many processes.'
      $ECHO '\tMigrate to MP or use cron or nice.'
    else
      $ECHO '\tHigh user time w/few processes.'
      $ECHO '\tDivide processes into subprocesses, profile and optimize code.'
    fi
  fi
  if [ $avg_intfaults -gt 1000 ]; then # 30% of 3000 (peak)
    $ECHO '\tHigh interrupt rate. Culprits are:'
    vmstat -i | awk '{
      if ( $1 == "interrupt" || substr($1, 1, 4) == "----" || $1 == "Total" )
        next;
      if ( $NF > 30 && $1 != "clock" )
        printf("%s %d\n", $1, $NF);
    }' | while read device rate ; do
           $ECHO "\t\t$device ( $rate / second )"
           case $device in
           ie* | le* ) $ECHO '\t\t\tCheck transceiver or try NC400.' ;;
           mti* ) $ECHO '\t\t\tTry intelligent terminal servers.' ;;
           zs* ) $ECHO '\t\t\tCheck for noisy ports or try HSI.' ;;
           esp* ) $ECHO '\t\t\tTry SBE.' ;;
           * ) $ECHO '\t\t\tUnknown solution (now).'
           esac
         done
  fi
}

check_network() {
  echo 'Checking network condition...'
  if [ $sol2 -eq 0 ]; then
    nfs_mounts=`df -t nfs | wc -l` # actually, minus one for the header
  else
    nfs_mounts=`df -F nfs | wc -l` # actually, minus one for the header
  fi
  f=/tmp/$$
  netstat -i | egrep -v '^Name|^lo0' | (
    while read name mtu net add ipkts ierrs opkts oerrs collis queue ; do
      if [ -z "$t_ipkts" ]; then t_ipkts=0; fi
      if [ -z "$t_ierrs" ]; then t_ierrs=0; fi
      if [ -z "$t_collis" ]; then t_collis=0; fi
      if [ -z "$t_opkts" ]; then t_opkts=0; fi
      t_ipkts=`expr $t_ipkts + $ipkts`
      t_ierrs=`expr $t_ierrs + $ierrs`
      t_collis=`expr $t_collis + $collis`
      t_opkts=`expr $t_opkts + $opkts`
    done
    echo $t_ipkts $t_ierrs $t_collis $t_opkts
  ) > $f
  read t_ipkts t_ierrs t_collis t_opkts < $f
  /bin/rm -f $f
  echo $t_collis $t_opkts $t_ierrs $t_ipkts | awk '{
    coll_rate = $1 / $2;
    err_rate = $3 / $4;
    if (coll_rate > 0.05)
      printf(" High collision rate ( %g %% ). Subnet or check cabling.\n", \
        (coll_rate * 100.0));
    if (err_rate > 0.00025)
      printf(" Error rate not zero ( %g %% ). Increase buffer space.\n", \
        (err_rate * 100.0));
  }'
  set -- `nfsstat -rc | tail -1`
  echo $1 $3 $4 | awk '{
    calls = $1;
    retrans = $3;
    badxid = $4;
    if (( retrans / calls ) > 0.05 )
      if (( badxid / calls ) < 0.05 ) {
        printf(" High retransmission rate.\n");
        printf(" Check routers and bridges for dropped packets.\n");
        printf(" Try decreasing rsize and wsize in fstab\n");
        printf(" to improve NFS client I/O.\n");
      } else {
        printf(" Bad server response time for client.\n");
        printf(" Try increasing timeo in fstab to improve\n");
        printf(" NFS client I/O.\n");
      }
  }'
  if [ $sol2 -eq 0 ]; then
    udp_overflows=`netstat -s | fgrep 'socket overflows' | awk '{ print $1 }'`
  else
    udp_overflows=`netstat -s | fgrep udpInOverflows | awk '{ print $6 }'`
  fi
  if [ $udp_overflows -gt 0 ]; then
    n_nfsd=`psaxc | fgrep nfsd | wc -l`
    nn_nfsd=`expr $n_nfsd + 4`
    $ECHO "\tOverrun of nfsd processes ( $udp_overflows times )"
    $ECHO '\tTry increasing from' $n_nfsd to $nn_nfsd
  fi
  f=/tmp/$$
  nfsstat -s | tail -5 | egrep -v 'wrcache|mkdir' | (
    set -- `line`
    getattr=$4
    shift 11
    readlink=$1
    nread=$3
    set -- `line`
    nwrite=$4
    line > /dev/null
    echo $getattr $readlink $nread $nwrite
  ) | tr -d '%' > $f
  read getattr readlink nread nwrite < $f
  /bin/rm -f $f
  if [ $getattr -gt 35 ]; then
    $ECHO "\tHigh getattr count ($getattr %)."
    $ECHO '\tCheck actimeo in fstab for client NFS I/O'
    $ECHO '\t\tand increase for read-only clients.'
  fi
  if [ $readlink -gt 5 ]; then
    $ECHO "\tHigh readlink count ($readlink %)."
    $ECHO '\tCut down on number of symbolic links on NFS mounts for clients.'
  fi
  if [ $sol2 -eq 0 ]; then
    strings /vmunix | grep -is presto
    has_presto=$?
  else
    strings /kernel/unix | grep -is presto
    has_presto=$?
  fi
  if [ $nwrite -gt 5 ]; then
    $ECHO "\tHigh percentage of NFS writes ($nwrite %).\c"
    if [ $has_presto -eq 1 ]; then
      echo " Add PrestoServe."
    else
      echo " PrestoServe already installed."
    fi
  fi
  if [ $nread -gt 30 ]; then
    $ECHO "\tHigh percentage of NFS reads ($nread %). Add NC400."
  fi
}

trap "/bin/rm -f $AWK_FILE" 0

while true
do
  get_cpu_times
  check_paging_swapping
  check_disk_saturation
  check_dnlc
  check_cpu
  check_network
  if [ $once -eq 1 ]; then
    exit
  fi
done

--------------------- END perf_scr INCLUSION ------------------------



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:08:57 CDT