SUMMARY: vmstat -s returns negative cache stats

From: Cliff Trapp (wpsun4!clifft@uunet.UU.NET)
Date: Fri Apr 09 1993 - 21:32:43 CDT


My original question:

> Dear net:
>
>
> What is wrong with my Sun 690? How can I have a negative .36 cache hits.
> I set MAXUSERS to 48 a few weeks ago. Now cache hits is worse.
>
> 577408 cpu context switches
> 1369506 device interrupts
> 366815 traps
> 1672810 system calls
> 36348888 total name lookups (cache hits -36% per-process)
> toolong 1358615
>
>Cliff Trapp
>WordPerfect Corp.
>wpsun4!clifft@uunet.uu.net

The answer is basically that vmstat -s returns a bogus percentage
when the number of lookups grows too large. The solution is to use
a couple of scripts (provided) to get the accurate percentage of
name lookups. My understanding is that you want to hit the cache
at least 70% of the time. If you're not, you need to increase
MAXUSERS on the NFS servers.

Here is a synopsis of the replies:

If the number of cache hits or the number of lookups gets too big,
the computation of the percentage fails or the numbers wrap around.
The numbers reported by vmstat -s are only valid shortly after reboot.

The only machine on my net that gives plausible numbers is up two days.
even normal single-user workstations give garbage statistics in less
that two weeks.

Try this, it gives the raw numbers:

#!/bin/sh
adb -k /vmunix /dev/mem << EOF
=n"Buffer Cach Statistics Since Last Boot"
="--------------------------------------"
nbuf/D"Buffer Cache Size"n
bstats/D"total buffer cache read requests"n
+/D"hits in cache"n
+/D"times and aged buf was allocated"n
+/D"times an lru buf was allocated"n
+/D"times had to sleep for a buf"n
EOF

#!/bin/sh
adb -k /vmunix /dev/mem << EOF
=n"Directory Name Cache Statistics Since Last Boot"
="-----------------------------------------------"
ncsize/D"Directory Name Cache Size"
ncstats/D"cache hits"n
+/D"cache misses"n
+/D"number of enters into cache"n
+/D"number of enters tried when already cached"n
+/D"long names tried to enter"n
+/D"long names tried to look up"n
+/D"times LRU list empty"n
+/D"number of purges of cache"
EOF

I also provide you 2 previous SUMMARIES talking about tuning documents available

depending on your configuration (mem size ?, usage local CPU/NFS?) & OS version
the answer may vary from add memory to add nbuf or add MAXUSERS !.

So I advise you to read the documents available on the net, check with ker*stat
and increase the buffer/ressource which is to small (like with kermod).

#!/bin/sh
adb -w /vmunix << EOF
nbuf?W 0x70
EOF

  The key is that the total name lookups is > ~21000000. There is a bug
(probably where someone multiplies by 100 before their divide) in the code
which calculates the total name lookups. The % is only valid until the
total name lookups grows too large. So, check after a reboot...

My thanks to all who responded:

louis@andataco.com
casper@fwi.uva.nl
uunet!eps.slb.com!leclerc
uunet!phyast.nhn.uoknor.edu!feldt

If anyone wants the tuning documents that were sent to me, I'll
be happy to forward them. (I haven't read them yet due to time
constraints)

Cliff Trapp
WordPerfect Corp.
wpsun4!clifft@uunet.uu.net



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:07:42 CDT