SUMMARY: cache hits on NFS file server

From: Mike Phillips 3788 (ukcphmr@ukpmr.cs.philips.nl)
Date: Mon Nov 07 1994 - 06:28:21 CST


Thanks to all who responded, I had 29 replies with 4 ME-TOO's.

Original question was
>
>Hi,
>
>I am trying to optimise the performance on a 4/470 NFS file server and seem
>to be getting nowhere fast. Can anyone out there help.
>
>The configuration I have is :
>
>Sun 4/470 SunOS 4.1.3
> 160mb RAM
> legato prestoserv card
> 2 x 900mb IPI disks
> 2 x 1300mb Micropolis disks
> 1 x 2100mb Sun / Seagate disk
> 3 x ethernets
>Kernel config maxusers 128
>nfsd processes 32
>
>The nfs clients are 20 x diskless SLCs running light applications and doing very little swapping
> 15 x diskfull Classics running light applications mostly text editing an wp
> 25 x pcs running PC-NFS 5.0/5.1 386sx , 486sx , 486dx
>
>The server holds diskspace for user/project data, OS for diskless clients and Unix ONLY applications,
>no pc applications are on the server. Most of the pcs are now running MSwindows.
>
>The changes I have so far made are :
> - Increased memory 32mb to 160mb
> - Uped maxusers 32 to 128
> - Added prestoserv card to increase write performance
>
>Unfortunately the server still seems to be getting low process cache hits
>and the load has increased rapidly recently.
>
>When I do an uptime I get :
>
>uptime
> 10:41am up 19:30, 3 users, load average: 29.75, 25.31, 20.96
>
>
>When I do a vmstat -s I get :
>
>vmstat -s
> 75407 swap ins
> 0 swap outs
> 50094 pages swapped in
> 50178 pages swapped out
> 900038 total address trans. faults taken
> 78274 page ins
> 13816 page outs
> 116319 pages paged in
> 13865 pages paged out
> 0 sequential process pages freed
> 307592 total reclaims (0% fast)
> 307586 reclaims from free list
> 1184 intransit blocking page faults
> 0 zero fill pages created
> 41672 zero fill page faults
> 0 executable fill pages created
> 0 executable fill page faults
> 0 swap text pages found in free list
> 307586 inode text pages found in free list
> 0 file fill pages created
> 0 file fill page faults
> 5578 pages examined by the clock daemon
> 0 revolutions of the clock hand
> 3797 pages freed by the clock daemon
> 32619662 cpu context switches
> 15270626 device interrupts
> 1110355 traps
> 17006424 system calls
> 1359756 total name lookups (cache hits 50% per-process)
> toolong 157888
>
>Which shows that on file system name lookups I am only getting 50% cache hits.
>
>Current load on the machine looks like this :
>
> vmstat -S 5
> procs memory page disk faults cpu
> r b w avm fre si so pi po fr de sr i0 i1 s0 s0 in sy cs us sy id
>33 0 0 0 5736875560 0 13 1 0 0 0 1 1 0 1 121 244 472 1 14 86
>31 1 0 0 57216 0 0 56 0 0 0 0 2 3 0 1 809 1382732 0 89 11
>31 1 0 0 56648 9 0 88 0 0 0 0 6 9 0 13 766 1152011 0 86 14
> 0 0 0 0 56224 6 0 48 0 0 0 0 2 2 0 1 672 1163043 0 70 30
>23 0 0 0 55960 9 0 32 0 0 0 0 3 1 0 1 677 2472573 2 74 25
> 8 0 0 0 55824 9 0 8 0 0 0 0 1 3 0 0 580 1453332 0 70 30
>27 0 0 0 55776 3 0 0 0 0 16 0 1 3 0 0 561 1273771 0 70 30
> 0 0 0 0 55760 9 0 0 0 0 0 0 1 1 0 0 579 1174296 1 71 28
>33 0 0 0 55752 21 0 24 0 0 0 0 18 6 0 3 614 1163557 0 77 23
> 6 0 0 0 55744 9 0 32 0 0 0 0 4 5 0 3 551 1773485 2 92 7
>26 0 0 0 55584 0 0 48 0 0 16 0 14 2 0 1 603 1243604 0 69 31
>31 0 0 0 55512 3 0 40 0 0 0 0 8 4 0 0 598 1033406 0 83 17
>37 0 0 0 55120 6 0 80 0 0 0 0 4 2 0 0 901 841489 1 96 3
>37 0 0 0 54824 12 0 152 8 0 0 0 7 1 0 1 909 751416 0 95 5
>33 2 0 0 54600 6 0 88 0 0 16 0 18 17 0 3 801 721296 0 98 2
>32 0 0 0 54576 6 0 40 0 0 0 0 9 2 0 1 785 49 847 0 99 1
>30 6 0 0 55000 0 0 32 0 0 0 0 6 3 0 3 825 55 795 0100 0
>38 0 0 0 55960 15 0 56 24 0 0 0 8 4 0 2 775 851601 1 94 5
>33 0 0 0 56296 12 0 96 8 0 0 0 13 4 0 1 893 992244 0 95 4
>
>When I look at the process table it is the nfsd's that have used most CPU time and
>as can be seen from the vmstat it is system that is using all the CPU.
>
>
>Sorry for the long intro but here are my questions.
>
>- If I increase maxusers again should my cache hit performance improve.
>- Will increasing the number of nfsd processes help.
>- I strongly suspect that a user or a group of users is causing this load
> how can I find out which NFS clients are responsible.
>- Should I just give up and accept that my server is on it's last legs and needs upgrading.
>

ANSWERS
=======

On increasing maxusers, most suggested that rather than increasing all the table sizes
with maxusers that I instead changed the parameter that effected the DNLC Cache table size.
Unfortunately different people gave me different parameter names !!!
-ncsize
-ninode

Number of nfsds, A script I was sent told me to increase the number of nfsd's but most
people told me to reduce nfsd's as the server was spending too much time context switching.

Is it a group of users ? The answers I got mostly suggested that I obtain nfswatch and monitor
which clients were loading the server. This I did only to find two users searching for core files
in a 700 Mb filesystem every 10 minutes. The reason they gave me was that they were
short of diskspace and thought that this was a good way of making sure that core files
always got deleted, thanks guys !!!!!

Other comments made were that I should try and eliminate all directories of over 14 characters
in length as the directory name cache does not work on these.

WHAT I HAVE DONE
================

- increased ncsize from 2266 -> 5000
- educated my two erant users.

EFFECT
======

- Cache hits still ~ 50% :-((
- Load - 10
- CPU ~ 40%

Many thanks to:

Kenneth.Erickson@Eng.Sun.COM (Ken Erickson)
"Chris Phillips" <chris@cs.yorku.ca>
Steve Young <syoung@cedar.buffalo.edu>
ray@isor.vuw.ac.nz
gdonl@gv.ssi1.com (Don Lewis)
Nate Itkin <Nate-Itkin@ptdcs2.intel.com>
Jeff Mallory <jeff@access.digex.net>
glenn@uniq.com.au (Glenn Satchell - Uniq Professional Services)
root@oslo.Geco-Prakla.slb.com (Rune Aarstad)
Stefan Hein <hein@tubtmpo1.ee.tu-berlin.d400.de
rad@getech.leeds.ac.uk
Markus Storm <storm@uni-paderborn.de>
Peter Galvin <pbg@cs.brown.edu>
martin@stavanger.sgp.slb.com (Martin Oksnevad)
dal@gcm.com (Dan Lorenzini)
gavin@durban.vector.co.za (Gavin Maltby)
John Dillon <dillonj@ochampus.mil>
John DiMarco <jdd@db.toronto.edu>
Tom Orban <orban@advtech.uswest.com>
nishan@lambo.alldata.com (Nishan Sandhar)
Dan Stromberg - OAC-DCS <strombrg@bingy.acs.uci.edu>
katz@rpal.rockwell.com (Morry Katz
jayl@lattice.com (Jay Lessert)
Jean-Louis Faraut <Jean-Louis.Faraut@sophia.inria.fr>
paulo@dcc.unicamp.br (Paulo Licio de Geus)
Jeff Collyer <jeff@bundy.cnet-pnw.com>
Mike Raffety <mike_raffety@il.us.swissbank.com>
stern@sunrise.East.Sun.COM (Hal Stern - NE Area Systems Engineer)

Kind regards

Mike Phillips
Systems Manager, Philips Telecom PMR, Cambridge

e-mail : mike.phillips@ukpmr.cs.philips.nl



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:09:14 CDT