SUMMARY: autoinstall boot problem

From: Jens Fischer (jefi@kat.ina.de)
Date: Tue Apr 30 1996 - 08:44:03 CDT


Hi sun managers,

First of all many thanks to:
Laura Taylor <ltaylor@voom.bbn.com>
as@uebemc.siemens.de (Andreas Schroeder)

who responded to my question.

Unfortunately I have no solution yet, but I am a little bit closer now.

Here is my original query:

> Hi sun managers,
>
> I have got a problem with an upgrade to Solaris 2.5 of two SUNs currently
> running Solaris 2.2.
> I have setup an install- and bootserver on one of them, did all the necessary
> configurations and tried to reboot the other SUN via the net.
>
> The client starts to boot, but after the tftp of inetboot it halts with the
> following message:
>
> RPC: procedure unavailable
> whoami RPC call failed with rpc status: 10
> panic: boot: could not mount filesystem
>
> and ends up with the ok prompt.
>
> The two SUNs are connected via a remote bridge, without any filtering on the
> bridge. Both systems are LXes, tftp works and gets the right inetboot version,
> bootparamd runs and replies, too.
>
> The problem seems to be similar to the SUN bug-ID 1117036, but as you can see
> in the appended snoop Output there seem to be no replies from any other host.
> The only thing that I could imagine is that there is a timeout occuring due to
> the realy long time that the BPARAM WHOAMI reply needs to get out of the
> server.
>
> Does anybody know what causes these problems?
>
> Kind regards - Jens Fischer
...

Laura Taylor mentioned that I should check the dfstab on the Installserver.
Actualy I have double checked this before, even by mounting /export/install
by hand on the Client. As the snoop shows there is even no mount request from
the client at all.

I got a me too from Andreas Schroeder who ran into similar problems. The only
solution he got from SUN till now is to kick off the net every machine that
responds to the BPARAM WHOAMI request. I have appended his entire posting at
the end of this mail.

There was another sun-managers SUMMARY just comming in which gave me a hint how
to find out which machine replies incorrectly to the request. There is a
command hostconfig (see man) which uses the same protocol. By doing a
hostconfig -n -v -p bootparams
and snoop simultanously on the client to be upgraded I found one reply from
an HP:
 12 0.00363 199.245.169.5 -> sunsys2 RPC R (#11) XID=831435698 Procedure unavailable

Unfortunately this HP is a server which can not be taken off the net.

This led me to the next question:
Does anybody know how to turn off these "procedure unavailable" responses on an
HP?

Any hints would be nice.

TIA - Jens Fischer

The following is the reply from Andreas Schroeder:

----- Begin Included Message -----

>From as@uebemc.siemens.de Thu Apr 25 09:42 MET 1996
From: as@uebemc.siemens.de (Andreas Schroeder)
Date: Thu, 25 Apr 1996 09:06:10 +0200
To: fischjns@kat.ina.de
Subject: Re: autoinstall boot problem

Hi,

I recently tried to update ~70 suns via a bootserver with Solaris 2.4
in one of our subnets and got the same error message as you:

> RPC: procedure unavailable
> whoami RPC call failed with rpc status: 10
> panic: boot: could not mount filesystem

My snoop outputs looks like yours and a network-analyzer confirmed that we got
all the packets.
Our problem is a Novell-Server-PC with some 3rd-party NFS-SW. He replied
much more quickly to the clients (bonny) "BPARAM WHOAMI" request than
our SS1000E (clyde)!

1348 0.00045 clyde -> bonny TFTP Data block 319 (372 bytes) (last block)
1349 0.00202 bonny -> clyde TFTP Ack block 319
1350 0.10757 OLD-BROADCAST -> (broadcast) RARP C Who is 8:0:20:75:6a:c3 ?
1351 0.00414 clyde -> bonny RARP R 8:0:20:75:6a:c3 is 132.29.3.54, bonny
1352 0.00124 bonny -> 132.29.255.255 BPARAM C WHOAMI? 132.29.3.54
1353 0.00138 132.29.1.127 -> (broadcast) ARP C Who is 132.29.3.54, bonny ?
1354 0.01696 clyde -> bonny BPARAM R WHOAMI? bonny in emc

After disabling the NFS-Service on the PC everything work fine again!
Some investigations in the related RFC's showed that this reply isn't really wrong, so
the behavior of the sun's must be bad!

So I made up a call at Sun-Service. They already knew the problem, but the only
solution they offered us was a patched inetboot which didn't solved the problem.
So they recommend us to put this PC behind router.

COULD THIS BE A SOLUTION TO KICK OFF THE NET EVERYBODY WHO DON'T ACT'S LIKE SUN EXPECT'S ???

We got a lot of different types of computers on our LAN, so you can wait until the
next trouble comes.

We decided to escalate the problem at Sun-Service (we have a gold-maintanace-contract),
but they said that we don't can expect a quick solution (> 6months !).

Fortunatly our net-admins and the owner of the Novell-PC agreed to install a router.
So we can proceed with our work this time.

Here is another one:

> From sun-managers-relay@ra.mcs.anl.gov Wed Apr 17 14:17:06 1996
> Subject: SUMMARY: Diskless client won't boot (rpc call failed)
> X-Sender: peter@POPServer
> To: sun-managers@eecs.nwu.edu
> X-Envelope-To: sun-managers@eecs.nwu.edu
> Mime-Version: 1.0
> Content-Transfer-Encoding: 7BIT
>
> Original message:
>
> >> After a period of proper booting the clients now refuse to boot. The
> >> symptom is as follows: when "boot" is entered at the monotor prompt, the
> >> normal hex numbers appear on the screen. Instead of the usual hostname
> >> message we then get:
> >>
> >> RPC: Authentication error:
> >> local: unkown error.
> >> whoami RPC call failed with rpc status: 7
> >>
> >> panic - boot: Could not mount filesystem.
> >>
> >> The server and diskless client are running SunOS 5.4 patchlevel 36. All
> >> further recommented patches are installed (although not always the latest
> >> versions).
> >> NIS nor NIS+ is used.
>
> Sun came up with the suggestion to use the program 'snoop' to analyze the
> IP traffic during the boot. After the tftp boot ends, the hostname is asked
> through rpc calls or broadcasts, whatever. Snoop showed that, in response
> to this request, the client receives error messages from SGI machines in an
> entirely different subnet. It turned out that the system manager of the
> SGI's changed the portmapper configuration a bit a few weeks ago. He
> configured the software such that it should respond only to request from
> within its own subdomain (as is advised in the Satan software doc's). So
> the RPC: Authentication error: came from the SGI and not from our boot
> server. Apparently there is a bug somewhere in Sun's rpc software. The
> work-around that we use now is that the SGI portmapper config is changed
> such that our subdomain is given permission to use their portmap services.
> When we boot in this situation, snoop reports only trafic between boot
> server and client, as should be the case; no packets from the SGI's are
> seen.
>
> Sun created a bug-id on this problem, namely 1246402.
>
> BTW, the problem was not only with booting diskless nodes, but also with
> starting certain software running on the server. For instance, we use
> Helios ethershare for AppleTalk services. The atalkd (part of ethershare)
> didn't boot during the time the SGI's had their security restrictions. With
> the work-around described above, also these problems disappeared.
>
> Thanks to all who responded,
>
> Peter.
>
>
>

Hope that helps you.

Regards,

        Andreas Schroeder

--
+--------------------------+------------------------------------+
| Andreas Schroeder        |  Internet:   as@uebemc.siemens.de  |
| Siemens AG, OeN ED A 41  |                                    |
| Hofmannstrasse 51        |  Phone:      +49 89 722 23985      |
| 81359 Muenchen           |  Fax:        +49 89 722 33887      |
+--------------------------+------------------------------------+

----- End Included Message -----



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:59 CDT