SUMMARY: Upgrade to 2.5.1 and assorted NIS+/NFS problems

From: Colin J. Wynne (cwynne@brutus.mts.jhu.edu)
Date: Tue Feb 11 1997 - 16:40:37 CST


-----BEGIN PGP SIGNED MESSAGE-----

At this point there are a few minor bugs remaining, but I'll deal with
them in their time. I want to extend a sincere and hearty thanks to
the numerous people who helped out on this one. For those who managed
to miss out on my frantic messages of the last few days, I will recap
with summaries and acknowledgements.

The original situation was this: I had an NIS+ root master server
(brutus) running Solaris 2.5 and a client (cassius) running 2.5.1, and
decided finally to upgrade the server to 2.5.1.

Here's what went wrong in the order I noticed and fixed them. It's
long, but not as long as the time I spent on all of this... (BTW, did
I mention I only get paid for 10 hours of sysadminning a week?)

PROBLEM #1
==========

  NIS+ broke utterly, as rpc.nisd kept dying on startup with a corrupt
  transaction log complaint.

  ANSWER
  ======

  It turns out that the very first suggestion, offered by

        Nick Murray <nmurray@csd.abdn.ac.uk>

  did the trick. The syslog was reporting a failed attempt to map a
  table of size about 256Mb. Well, it turns out that there were
  resource limits which prevented a virtual memory image of that size.
  This *hadn't* been a problem under 2.5, but I noticed whilst combing
  through SunSolve for patches that there was a 2.5 bug report about
  rpc.nisd starting up despite bad transaction logs. Apparently, and
  much to my annoyance, :) 2.5.1 fixed that.

  Anyway, having rewritten the limits so that root is exempt (so sue
  me---I never figured on anything needing *that* much memory...),
  rpc.nisd started up.

  ADD'L INFO
  ==========

  I would like to thank several other people who responded, namely

        Cecil Pang <cecilp@adonis.westel.com>
        Virginia Coffindaffer <CoffindafferVirginia@wangfed.com>
          (who also gets the prize for having the coolest name among
           the respondants :)
        Francis Liu <fxl@pulse.itd.uts.edu.au>
  and
        Stuart Kendrick <sbk@fhcrc.org>

  They pointed out various other useful pieces of information.
  Especially helpful was Cecil's reminder that the nis files are
  `hole' sensitive and must be moved about with something like
  ufsdump/ufsrestore in order to leave the files usable. In fact, at
  the time I was mounting the pre-upgrade NIS+ directory over /var/nis
  and this hadn't been a problem, but without the reminder, I probably
  would have just used tar *after* I got the rest of the setup working
  and would have broken things *again*. :)

  The other responses included some good general information about how
  NIS+ uses its files, and reminded me to checkpoint my server more
  often...

PROBLEM #2
==========

  After I got to the point where the server could access its own NIS+
  tables, I was unable to get the client to authenticate the server,
  hence the client couldn't get any of the NIS+ tables. The symptom,
  of course, was the ever-loving `corrupt window' error.

  Now, in tracking down a similar problem once before, I was pointed
  (by this list) at xntpd. NIS+ authentication is heavily reliant on
  timestamp checking to make sure requests are timely and such, so
  unless you have software to synchronize the clocks of machines in an
  NIS+ domain, using NIS+ is very hit or miss.

  The previous trouble had been nowhere near as serious as preventing
  the whole client machine from accessing the server; rather, then I'd
  had intermittent problems with automounting. And the fact of the
  matter was, I *was* running xntpd.

  Or so I thought. Turns out that (I believe at the recommendation of
  the xntpd install docs) under /etc/rc2.d xntpd was being started
  later than RPC services. Therefore the time synch wasn't being
  performed in time to allow authentication to happen. Just to add to
  the subtlety of the problem, the clocks were off by just about a
  second, which meant that a few times when I compared `date' outputs,
  they came out the same. Anyway, I moved the xntpd start ahead of
  rpc (and I also moved sshd ahead of rpc---that way, when the machine
  hung at NIS+ startup, and didn't know any users, I could ssh in as
  root from a remote machine), and the authentication problems went
  away.

  Thanks to

        Kevin Davidson <tkld@cogsci.ed.ac.uk

  for responding on this one.

PROBLEM #3
==========

  Okay, near the end, now. This time around the client could get NIS+
  tables, knew who the users were, etc., but hung on NFS mounting
  anything from the server with an `RPC: Program not registered'
  error. All I ended up doing was stopping and then restarting the
  nfs.server script on the server. I don't know why this was
  necessary, since both machines had been freshly booted with the, at
  that point, current configuration just before. Thanks to

        Mattias Zhabinskiy <mattias@txc.com>
        John Justin Hough <john@oncology.uthscsa.edu>
        Rasana Atreya <atreya@library.ucsf.edu>
        Mike D. Kail <mdkail@fv.com>
        Aline Runde <ARunde@mms.com>
        Rick von Richter <rickv@mwh.com>
  and
        Jim Harmon <jim@telecnnct.com>

  for helping with this last part.

  (Rasana also pointed me towards the searchable list archives at

        http://www.LaTech.edu/sunman-search.html

  which is certainly nice to know.)

  I do have a question related to this last topic, though. Everybody
  mentioned the various daemons which need to be running on the NFS
  server: statd, check; mountd, check; biod... I ain't got it. Did
  this go away with 2.5.1? It is nowhere to be found, and it's lack
  doesn't seem to be hindering me.

FINALLY
=======

  There are two (2) remaining niggling little details which I haven't
  yet worked out, and which I think relate to NIS+. First off, I use
  the man_db package instead of Sun's man. man_db likes to have
  important things owned by a user, called man. Then the man
  directories and the man binary itself are setuid man. Well, when I
  run man on the client, cassius, it hangs. Not on the server. man
  is a local user (not an NIS+ user) and nsswitch is, indeed set to
  check files first.

  A not unrelated problem involves root on the client, which is
  obviously not an NIS+ account. You see, the server exports
  /var/mail to the client. This includes the root mail file. Well,
  this worked fine before, but now mail (mailx/Mail) hangs when
  invoked by root on the root mailbox.

  In both cases this is a pretty serious hang, the kind that requires
  killing the underlying shell and leaving a zombie. The logs don't
  seem to indicate where the problem is, and I haven't gotten anything
  out of the truss yet.

But anyway, these are now minor problems, and my system is by and
large usable again. (And there was much rejoicing.)

I'd just like to point out that, once again, this list proves
invaluable in its help, while our multi-kilobuck tech support contract
did exactly squat. Thanks again for all your help,

CJW

- --
**********************************************************************
    /\ Colin J. Wynne Johns Hopkins University
   (()) Dep't of Mathematical Sciences
  /____\ ``Lunatic-at-Large'' E-Mail: cwynne@mts.jhu.edu
 /______\
/________\ The cost of living hasn't affected its popularity.
**********************************************************************

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2
Comment: http://www.mts.jhu.edu/~cwynne/

iQCVAwUBMwD1YXEHfObrVHptAQHxcwP/Y2Wb4IJAuosK7k7mnS3zYBeo3ACBjRWJ
eTSv9RX81zhcyUE+PR0U6s2mqCvXMruxpkm6c07yWJu8u+gyOhC0GUJxhonQTgTD
tQZOlDISKeOfbEI6vDZyzB2fc7jhpkUTQXo3XN2f7MROWYnI37S81X6uiOeFekfM
YQUYxEMJXAo=
=d+N3
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:46 CDT