SUMMARY: NIS+ client failover

From: Stuart Kendrick (sbk@fhcrc.org)
Date: Thu Jan 23 1997 - 14:10:36 CST


I've been investigating how NIS+ clients failover to alternate NIS+
servers, when their favorite one dies.

Thanks to Mike Jones for lots of insights and a URL:
http://ee.snu.ac.kr/~ramdrive/NIS+_FAQ.html

--sk

Stuart Kendrick
Network Services
FHCRC

Original message:

I'm trying to understand how NIS+ clients failover to backup NIS+ servers.
I've been watching traffic emanating from a NIS+ client with a Sniffer,
while yanking the network connection to its NIS+ server.

I can see that the client queries its primary server a few times, then
tries one of the other servers in the domain. (The fail over pattern
varies. Example 1: packet, wait two seconds, packet, wait two seconds,
try next server. Example 2: packet, wait six seconds, packet, wait six
seconds, packet, wait two seconds, try next server. Example 3: packet,
wait three seconds, packet, wait six seconds, packet, wait six seconds,
try next server.)

And it doesn't just try the next server, it fires off a packet to each of
the remaining servers in the domain, then picks one of the servers which
responds.

I don't ever see the client giving up on its primary server. I would
expect that at some point it would say "Gee, this guy is dead, I'll quit
asking him questions." This is how the DNS resolver code works. And
every now and then, the client throws a DNS query toward the first server,
just to see if the first server has returned to life.

/usr/sbin/ncsd caches some information; for instance, once I had done a
"nisgrep machine hosts.org_dir", the entire hosts table was cached locally
at the client, and subsequent host look-ups didn't hit the wire at all.

>From the user point of view, things like logging in take longer, because
the passwd table and cred table look-ups go through this failover pattern
before succeeding.

If I reboot, the client selects some other, living, NIS+ server, and
things are hunky-dory.

Does anyone know what algorithms are used by a client to decide how to
failover to backup servers, when its first one dies?

And is there a way to declare the master server off-limits to the clients?
In this case, I would prefer not to have any clients using a domain's
master server, I would prefer that they hang off the replicas. This is
easy to do with DNS, but I suspect hard to do with NIS+.

--sk

Stuart Kendrick
Network Services
FHCRC

---------- Forwarded message ----------
Date: Tue, 14 Jan 1997 14:56:12 -0500
From: Mike Jones <jonesmd@unifiedtech.com>
To: sun-managers@ra.mcs.anl.gov, sbk@fhcrc.org
Subject: Re: NIS+ client failover

Stuart Kendrick writes...
> I'm trying to understand how NIS+ clients failover to backup NIS+ servers.
> I've been watching traffic emanating from a NIS+ client with a Sniffer,
> while yanking the network connection to its NIS+ server.

Oh, isn't that *fun*? I really like NIS+, but I wish that Sun would
(a) bribe, beg, or bully some other vendors into providing client support,
and (b) give out more information about how it works....
 
> I can see that the client queries its primary server a few times, then
> tries one of the other servers in the domain. (The fail over pattern
> varies. Example 1: packet, wait two seconds, packet, wait two seconds,
> try next server. Example 2: packet, wait six seconds, packet, wait six
> seconds, packet, wait two seconds, try next server. Example 3: packet,
> wait three seconds, packet, wait six seconds, packet, wait six seconds,
> try next server.)
> And it doesn't just try the next server, it fires off a packet to each of
> the remaining servers in the domain, then picks one of the servers which
> responds.

I'm not sure about why the failover is different, but I can explain the
rest. NIS+ doesn't have a concept of the "next server". When a client
is booted, it picks a "closest server" (more on how in a moment) and
caches that in the NIS_COLD_START file. Whenever a NIS+ request is made
by the client, it first tries the "closest server". If that fails, it
does a "NIScast" (yes, that's what Sun calls it) to all the servers in
its domain (this information is also in the NIS_COLD_START file) and
binds to the one that answers first. This is also how it picks the "closest
server". This "binding" is done on a per-process basis, which is why (for
example) the failover happens every time a user logs in, but not every
time his shell needs a lookup after that.
I'm not 100% certain on some of this, particularly on what information
is in the NIS_COLD_START file and what is in the NIS_SHARED_DIRCACHE. I
know that a list of all the servers is kept in the cold start file, and
that the NIScast stuff works as I described. The "closest server" may just
be cached in memory by nisd.
 
> I don't ever see the client giving up on its primary server. I would
> expect that at some point it would say "Gee, this guy is dead, I'll quit
> asking him questions." This is how the DNS resolver code works. And
> every now and then, the client throws a DNS query toward the first server,
> just to see if the first server has returned to life.
 
The client doesn't ever give up on its primary server. Since a process
sticks to a particular server once it binds, I can only presume that Sun
figured it would be better to generate once-per-process failover traffic
than to assign some number of processes to a backup server once the
primary has failed. The choice they made ensures that (a) any processes
that start once the primary server is back up will immediately go to it,
and (b) if the failover process hits a particular backup server a lot and
slows it down, it will begin to choose other servers for some amount of
load balancing.

...stuff deleted....

> And is there a way to declare the master server off-limits to the clients?
> In this case, I would prefer not to have any clients using a domain's
> master server, I would prefer that they hang off the replicas. This is
> easy to do with DNS, but I suspect hard to do with NIS+.

Hm. If you're brave you could go into the NIS_COLD_START file and mangle
the IP address of the master to guarantee that no one will bind to it.
I'm not sure what the side effects might be, though.

        Mike Jones
        Sr. Network Computing Advisor
        UNIFIED Technologies



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:43 CDT