SUMMARY: DNS emergency *** Can't find server name for address

From: System Admin (sysadmin@jetlink.net)
Date: Mon Feb 26 1996 - 04:04:51 CST


Sun Managers,

We had a problem where the stock Sun in.named stopped working. I think I know
what happened, but the process to figure it out was a rollercoaster. There
were approximately 10 responses to my SOS messages, all of which offered
very friendly help. All except for two I could not apply to this problem.

We confirmed that is had noting to do with file permissions, named.boot,
and db.* files. resolv.conf and nsswitch.conf were right, etc, etc.

The two most helpful replies came from:

Mark Anderson (anderson@neon.mitre.org)
(Who recommended using Berkeley BIND because Sun's in.named was not as good)

And Daniel Blander (Daniel.Blander@ACSacs.com)
(Who confirmed that we had not been hacked by the recent CERT problem with
BIND, and recommended running in.named in debug mode through truss.)

We had personal help from Mark C. Cooper (mark@dingo.cv.com) who works as
a technician for our support company ComputerVision. He guided me through
the truss debug procedure.

What happened to us?
--------------------
We added a domain to our named.boot and in.named would not work anymore. The
daemon would run, but hang on execution. The nslookup program could not
interact with the nameserver.

What We did about it:
---------------------
The first time it broke, we upgraded from Solaris 2.4 to Solaris 2.5 and
the problem went away. Then I added a few more domains to the named.boot
file and it broke again. (If you back out the change it stays broken).
I had to reload 2.5 AGAIN to get a funtional DNS machine back.

After running named through truss we determined that named was starving for
kernel resources. The machine had at least 300 MB of free virtual memory,
plenty of RAM, and a load of 0.01. The truss showed error#11 all over the
palce. (Out of resources error).

The general hypothesis was formed:
----------------------------------
When the stock Sun in.named handles authoritative DNS for well over 60
domains, it ends up starving for kernel resources and corrupting the
kernel somehow... breaking it. When we upgraded to 2.5 it worked, perhaps
because the kernel is slighly stronger, but it broke after adding more
domains. Now we have the Berkeley BIND package, and it works beautifully
with about 70 full domains going, and some very extensive db.* files.

I wish named did not have to run as root... that way it would not be able
to screw anything up! It also would not be such a security problem either.
But, sice it is on a low port... root.

Mark told me that Sun's in.named works on streams. I don't know for certain
exaclty whay that would be a problem, but I am taking it from the experts
that what we want to do requires an industrial strength BIND program.

My advise to admins that wat to run lots of domain names is to get BIND
from Berkeley like I did before you have problems with the standard
in.named. It is possible that we are wrong about this of course, but
we started using BIND, and we have no more problems.

Thanks to Sun Managers for having such a great resource!

-- 
Adrian Jonathan Otto
System Administration
admin@jetlink.net     <-- My JetLink mailbox for work mail
aotto@aotto.com       <-- My personal mailbox for non-work mail



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:54 CDT