SUMMARY: find/cron causes automounter broadcast storms (LONGIS

From: Howard Hart (ntmtv!harthc@ames.arc.nasa.gov)
Date: Tue Sep 03 1991 - 21:53:24 CDT


Dear NetLanders,

        Sorry for the delay, but I've been gathering answers. Several days
ago, I asked how to avoid automount/broadcast storms induced by Sun's default
find cron entry,

        15 3 * * * find /files/home -name .nfs\* -mtime +7 -exec rm -f {} \; -o -fstype nfs -prune

touching every mount point in the auto.direct table on every machine in
the network. Thanks for the responses. It seems I'm not the only one who
has this problem. There doesn't appear to be a direct solution at this time
(I'll post another summary if someone comes up with the magic find option
that doesn't touch auto.direct mounts), so here are the current proposed
workarounds:

        1) try various permutations and options with find (-xdev holds the most
        promise) to try to skip nfs mounted directories without triggering
        automounter.

        2) since .nfsxxx files are only created on exported file systems
        (if I'm understanding this right, they must be exported read/write),
        modify the find cron entry to only check those filesystems (ie. -
        if home directories on all machines under /files/home mounted
        read/write via auto.home)

        crontab entry:
        15 3 * * * find /files/home -name .nfs\* -mtime +7 -exec rm -f {} \; -o -fstype nfs -prune

        3) stagger the find/cron checks by some permutation of the host IP
        address so all systems aren't automounting at the same time. (this
        alone will not remove the annoying root mail messages should one
        of the mountable directories be inaccessable) i.e. -

        /etc/hosts entry:
        197.192.1.200 nmtvs249

        time = 200 % 60 = 20
        
        crontab entry:
        20 3 * * * find / -name .nfs\* -mtime +7 -exec rm -f {} \; -o -fstype nfs -prune

        4) along the same lines of 3), drop the find execution frequency down
        to once a month or whatever you feel most comfortable with.

        5) remove the find entry and accept the consequences (no one could give
        specific examples of crashes or overflows on disks, though that would be
        very site/application specific).

        6) various combinations of the above.

My guess is if the below command can be modified to work, it's the best of all
possible solutions:

        find / -fstype 4.2 -name lib\* -print -o -fstype nfs -xdev -prune

NOTE: run command that follows as root to see what it should be doing (without
traveling down the NFS mounted file systems, of course). Check mounts
periodically to verify no auto.direct table entries are being mounted.

        find / -fstype 4.2 -name lib\* -print -o -fstype nfs -prune

Semi-final resolution for us will be to use the entry in 2) above (yes,
we were stupid enough to create home directories on each of our machines
a long time ago and I inherited it until we can afford a BIG! server). I
haven't decided yet whether to drop the find frequency or stagger the times
on cron entries, though I did get a script from fischer@math.ufl.edu
to do this programatically. My thanks to everyone below who responded,
especially auspex!guy@uunet.UU.NET (who had a more involved, automated
exported file systems checker solution than the one I suggested),
brsmith@cs.umn.edu for the solution we'll be using, and of course,
Hal Stern, stern@sunne.East.Sun.COM for an explanation even I can
understand (see below):

alc!button@fernwood.mpk.ca.us (Ross Button)
mcostel@kaman.com (Mark Costello)
fischer@math.ufl.edu (GR Fischer)
aldrich@sunrise.Stanford.EDU (Jeff Aldrich)
auspex!guy@uunet.UU.NET
miker@sbcoc.com
cdr@sachiko.acc.stolaf.edu (Craig Rice)
bb@math.ufl.edu (Brian Bartholomew)
brsmith@cs.umn.edu (Brian Smith)
hotssp!carl@att.att.com (Carl Brown)
jjd@alexander.bbn.com (Jim Alexander)
stern@sunne.East.Sun.COM (Hal Stern)
Mike.McCann@omni.eng.clemson.edu (Mike McCann)
(apologize in advance if I missed anyone).

Hal Stern's explanation of what's really going on:
---------------------------------------------------

the files can't be written to /tmp because they're usually
not written -- they're the results of an open/unlink/close
sequence, with a crash in the middle.

more detail: the unix operating system allows you to
open a file, then unlink it (by name), but continue
using the file descriptor. the data blocks attached
to the file don't go away until you close the file.
this works OK with local filesystems, but with NFS,
it's hard to tell (on the server) whether an unlink
was preceded by an open on that very file from that
very same client. so instead, when a client
unlinks an NFS-mounted file that it already has open,
the NFS client-side code sends a "rename" request
instead of an unlink -- the file becomes .nfsXXXX.
when the final close() happens, the .nfsXXX file
is removed.

if the client crashes between the rename and the
final close, then the file sticks around, and needs
to be cleaned up. it takes really flaky hardware (or
hyperactive L1-A users) to make these files appear regularly.

--hal stern
  sun microsystems
  northeast area tactical engineering group
--------------------------------------------------------

P.S. - We do have a lot of "hyperactive L1-A users" whose sole purpose
in life is to make me rebuild their file systems daily (just kidding....
sort of).

Howard Hart UUCP:{ames,pyramid!amdahl,hplabs}!ntmtv!harthc
System Administrator INTERNET: ntmtv!harthc@ames.arc.nasa.gov
Northern Telecom PHONE: (415) 940-2680
Mt. View, CA



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:19 CDT