SUMMARY: jumpstart now fails "Cannot mount root"

From: David W. Blaine (blained@gdls.com)
Date: Thu May 06 1999 - 10:52:49 CDT


Many thanks to:

Richard Felkins
Brooke King
Vincent Lescoe
Eddy Fafard
Parks Fields
Matthew Stier
Christophe Colle
Martin Oksnevad
Shouben Zhou
and probably others....

Before I give the "solution" to the problem, you must understand our network
(Heck, I don't understand it but I'll try to explain it). A majority of our UNIX
boxes sit on a flat network subnetted with a netmask of 255.255.192.0. These
UNIX boxes and our PC's (Win95/NT) share this net.

Here's the low-down on the problem:

I performed several snoops from other Solaris boxes on the same net as the
jumpstart server and client. Most did not see all the packets being transmitted.
This was extremely strange. So I took a PC connection stuck it into the back of
a Solaris box sitting next to it (remember the PC's are on the same net as the
UNIX boxes). When I did the snoop, it showed way more information than previous
attempts. I found a bogus reply from an NT box. I tracked this box down through
our NT Server organization. It runs ArcServe. Curiously, I found this exact
problem in Sunsolve (BUG ID# 4216302). After having them disable ArcServe, I was
able to perform jumps again. In the short term, the NT Server organization has
agreed to disable ArcServe during the day and only activate it at night to
complete their backups. Unfortunately, we do many of our jumpstarts afterhours
to prevent bogging down the network during peak usage. If anyone knows of a fix
for either NT or ArcServe on this issue, I would appreciate your response.

Here are some of the list's responses:

David:

This is caused by the faulty portmap server on your network. Most likely
is the SGI system with the misconfured setup. Actually you can use snoop
from another computer to monitor the packets, while you are booting one
machine over the net. You will find and catch which one is the bad guy.

Usually when the portmap server should ignore the broadcase message which
is not related to it. Unfortunately instead of ignoring the message, it
sends back the rejection message to the sender.

I have had the exact error messages three times in the past years. I am
pretty an expert on this one.

Good luck,

*--------------------------------------------------------------*
* Institute for Computer Applications in Science & Engineering *
* Shouben Zhou | *
* ICASE, Mail Stop 132C | *
* 3 West Reid St. Bldg. 1152 | Phone: (757) 864-6558 *
* NASA Langley Research Center | Fax: (757) 864-6134 *
* Hampton, VA 23681-2199 | Email: szhou@icase.edu *
*--------------------------------------------------------------*
----------------------------------
Hi,

Can you please summarize if you get a solution (hopefully soon).

I have the same urgent "whoami" problem when I try to do "boot net" with
solaris2_6hw_0398 from both a 2.5.1 and a 2.6hw0398 install server.

I am not doing anything with "jumpstart" as I only want to install
solaris2_6hw_0398 from scratch on 30+ machines from an install server
so I don't have to connect/disconnect a CD-drive on each machine.

Thanks.

Martin
------------------------------------
hmm are you sure that your jumpstart server has a read/WRITE export on the
OS disk? You probable forgot to export it ....

   share -F nfs -o anon=0 /export/install
   share -F nfs /export/config

cc
------------------------------------
Check your jumpstart server, 'share' to see if you
have any changes in mount sharing. If the share
wasn't in the /etc/dfs/dfstab the next reboot
won't start the rc for nfs.server.

--- Richard.

liv:/home/ull/richardf:% share
- /cdrom/cdrom ro=saic:nmrd:sandia ""
- /export/liv/designer rw=saic:nmrd:sandia ""
- /export/liv/Sol_2.6 ro,anon=0 ""
liv:/home/ull/richardf:%

----------------------------------------------------------------------
Richard L. Felkins Systems Administrator
Science Applications International Corporation (SAIC)
10260 Campus Point Dr. Email: richardf@gso.saic.com
San Diego, CA 92121 Phone: (619) 646-3321 FAX: (619) 458-4993
----------------------------------------------------------------------

------------------------------------
We had a problem like this once. It turned out an unused network
CDROM array was responding, badly, to the whoami requests of the
client. We discovered this with snoop and turned off the CDROM
array. Brooke King
------------------------------------
Another machine is responding to the boot request. I had the same thing happen
to
me. Do a snoop on the
jumpstart server and see if you can see who else is replying to the boot
request.

vjl

-------------------------------------
Just saw that one yesterday. This time I had
the name of system in the boot servers host table and the
NIS maps different. I saw it before also and the cause
was an old boot server with wrong ip information that I forgot
to decomission on a large move.

Eddy
-------------------------------------
David

My guess is something in your name service changed.
Try fully qualifying all host names in bootparam, hosts, and dfstab files.

this happen to me a couple of times.

parks
--------------------------------------
You have multiple systems reponding with different bootparam information.

This may happen, especially, when you use wildcards in the bootparam file.
--------------------------------------

 

------------- Begin Forwarded Message -------------

Date: Wed, 5 May 1999 10:13:42 -0400 (EDT)
From: "David W. Blaine" <blained@gdls.com>
Subject: jumpstart now fails "Cannot mount root"
To: sun-managers@sunmanagers.ececs.uc.edu
MIME-Version: 1.0
Content-MD5: 9/hDItmgxnjEsCt95yQ0Iw==

Hi Sun-gurus:

I have a serious problem that cropped up all of a sudden. I have not changed
the jumpstart on two servers but I get the following on both when I attempt to
jump a client: (NOTE: I have used these jump servers before without a problem -
well, until now.)

boot net - install
....
hostname: [target machine]
domainname: [NIS domain]
root server: [jumpstart server]
root directory: /jumpstart/solaris2_6hw_0398/Boot
....
WARNING: whoami: bootparam: RPC failed: error 7 (RPC Authentication Error)
WARNING: nfsdyn_mountroot: NFS3 mount_root failed: error 6

It goes on to say "Cannot mount root" and immediately causes a system panic.

Both servers are running Solaris 2.6 with the latest recommended and Y2K
patches. I am attempting to jump a machine with Solaris 2.6 hw 03/98.
Curiously, jumping the machine to Solaris 2.5.1 hw 11/97 works ok on either
server. Both servers are on the same network. I checked to make sure the target
machine did not exist in any other ethers or bootparams files. I have another
jumpstart server on a seperate network. Using this machine to jump the target
machine to Solaris 2.6 hw 03/98 worked. So the problem appears to be external to
the servers but do any of you know what???!!!!

------------------
David Blaine (blained@gdls.com)
Computer Systems Engineer
CSC for GDLS
Phone: 810-825-7650

------------- End Forwarded Message -------------



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:13:19 CDT