SUMMARY: No network after software install

From: Robert Alexander (ra@ftn.net)
Date: Wed May 03 2000 - 18:23:34 CDT


Dear Gurus,

This summary has been a VERY long time in coming; sorry about that,
but I just recently got the machine back, and just recently got the
problem solved (I think).

My original question:
>Hi All,
>
>the error: "le0: No carrier - cable disconnected or hub link test disabled?"
>
>question: is this ONLY a hardware error or could it be caused by
>software/config? If so, what?
>
>System:
>Sun Ultra 1 SBus (UltraSPARC 143MHz), OpenBoot 3.5, 192 MB memory installed
>Solaris 7 with patches up to mid-December. (SunOS Release 5.7
>Version Generic_106541-08)
>
>Background:
>I was working remotely on the Ultra 1 (assisted by a friend of mine
>who is a MUCH more experienced Sun admin than I am <G>) installing
>an upgrade to Apache and compiling-in support for SSL, mod-perl, and
>PHP3.
>
>All went well, and Apache re-launched successfully. I then made a
>few changes to my bind/DNS setup because I'd been having a bit of
>trouble with sendmail.
>
>Made a couple changes to 'named.conf' and my reverse file, then,
>just for the sake of a really clean startup after all the software
>changes, we rebooted. ('shutdown -y -g0 -i6')
>
>It never came back up on the network.
>
>I'm able to connect remotely to the console (ttya), so I did.
>During the boot it said: le0: No carrier - cable disconnected or hub
>link test disabled?
>
>Hmm. that LOOKS like a hardware error.
>
>After getting to the shop on Monday morning I opened up the box,
>wiggled all the cables, etc., but no joy.
>
>ifconfig returns :
># ifconfig -a
>lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232
> inet 127.0.0.1 netmask ff000000
>le0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500
> inet xxx.xxx.xxx.162 netmask fffffff0 broadcast xxx.xxx.xxx.175
> ether 8:0:20:79:bc:ee
>
>and from the 'ok' prompt I get:
>ok watch-net-all
>/sbus@1f,0/ledma@e,8400010/le@e,8c00000
>
>Using AUI Ethernet Interface
> Internal loopback test -- Ethernet chip initialization failed
>Using TP Ethernet Interface
> Internal loopback test -- Ethernet chip initialization failed
>
>
>Like I said, this sure LOOKS like the NIC on the motherboard is bad.
>It's just that the timing is so suspicious -- that it would fail on
>reboot after installing software.

The real quick answer is that it WAS a software issue, though I'm
unsure just *what* it was. I suspect it's related to Mark Sherman's
response below, which raises the question 'will it happen again?' See
my new question 'Solaris 7 and OpenBoot 3.x on Ultras'.

The system has been brought back online after a fresh install of
Solaris 7 11/99. The OpenBoot firmware has also been upgraded from
3.5 to 3.11.

Several people were kind enough to reply to my initial question.
Here's their responses:

At 13:25 -0800 2000/02/23, David Foster wrote:
>Try undoing your changes and see if the problem goes away. The above
>problem is a sendmail problem, and can be fixed by putting in lines like:
>
>Cwhostname
>Dwhostname
>
>in /etc/mail/sendmail.cf and sending -HUP to sendmail daemon.
>
>I always like to start out assuming that *I* caused the problem, especially
>if I just changed something, and go from there.

At 16:46 -0500 2000/02/23, David B. Harrington wrote:
>It is (probably) a pure hardware problem.
>
>Possibilities:
> a loose or broken cable,
> a bad NIC card,
> a bad hub connection,
> a bad hub.
>
>You'll have to troubleshoot the possiilities, but I'd start with the cable.
>With the ifconfig response, I'd tend to discount the card.
>
>A slight possibility is the eeprom setting "tpe-link-test?" setting (default
>= true).

At 16:50 -0500 2000/02/23, Mark Sherman wrote:
>Robert, this sounded familiar, and i found this, it may help:
>
>SRDB ID: 17780
>
>SYNOPSIS: watch-net-all Open Boot PROM test fails with "ethernet chip
>initialization failed" message
>
>DETAIL DESCRIPTION:
>
>Machines with OBP 3.x such as the Ultra 1, 2, 5, 10, 30, 60, 450 and Ultra
>Enterprise servers (3000, 3500, etc) may fail Open Boot PROM tests such
>as watch-net, watch-net-all or test net.
>
>Here's an example test from an Ultra 1 machine:
>
> OK> test net
> Using AUI Ethernet Interface
> Internal loopback test -- Ethernet chip initialization failed
> Using TP Ethernet Interface
> Internal loopback test -- Ethernet chip initialization failed
> net self test failed. Return code = -1
>
>SOLUTION SUMMARY:
>
>Do not conclude at this point that there is a hardware error. Prior to
>performing
>any OBP diagnostics or tests, the system should be reset or power cycled.
>Execute
>the following commands:
>
> OK>setenv auto-boot? false
> OK>reset-all
>
>(Note: If testing on an Ultra Enterprise server machine, you might want
>to add the OBP command "setenv diag-level min" to speed things up a little
>BEFORE doing the "reset-all".)
>
>The screen will blank, and the system will come back to the OK> prompt.
>
>Now let's retest the net device and see what happens.
>
> OK> test net
> Using AUI Ethernet Interface
> Internal loopback test -- succeeded
> External loopback test -- lost carrier (transceiver cable
> problem?) send failed.
> This is because the AUI port is not connected.
> The twisted pair port is connected.
> Using TP Ethernet Interface
> Internal loopback test -- succeeded
> External loopback test -- succeeded
>
>If the retest still shows a failure, additional troubleshooting is needed.
>
>Finally, do not forget to reset auto-boot? back to true.
>
> OK> setenv auto-boot? true

At 15:00 -0700 2000/02/23, Barry Gamblin wrote:
>Check the light on the hub or switch port you are connected to.
>Sounds like you did indeed lose the ethernet hardware, and it
>was just coincidental to your software install.

(love the handle :>)
At 14:37 -0800 2000/02/23, Lusty Wench wrote:
>I've seen what you describe many times on Ultra 1s while I was working at
>Sun. In _some_ cases, it can help to power down the machine entirely for
>a while (often a minute or two was ok; once, in disgust, I waited something
>like a half hour or longer). Doing this most often "resolved" the problem.
>In much rarer cases (only a handful of times that I can recall) it actually
>did require replacing the motherboard.

At 10:35 +1030 2000/02/24, Brett Lymn wrote:
> >question: is this ONLY a hardware error or could it be caused by
> >software/config? If so, what?
> >
>
>No, it can be a software setting. Check this eeprom parameter is set
>like this:
>
>tpe-link-test?=true
>
>If this is not set to true then the machine may behave as if the
>network HW has died - some hubs wait for the link test before they
>enable the port which may be what is biting you.

At 12:13 +0000 2000/02/24, robsonk@ebrd.com wrote:
>Planes crash most often on take-off and landing due to the additional
>stresses of these activities, this is often true of computers too.....



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:14:07 CDT