Summary: Sun Blade 100 - strange behavior after firmware update.

From: Scott Mickey <mickey_at_denver.net>
Date: Tue Aug 22 2006 - 12:13:00 EDT
Sun Managers, 
 
I have a Sun Blade 100 that after OBP firmware update to 
version 4.17.1 and installation of Solaris 10 01/06, it 
kept dropping to the ok prompt with a 'RED State Exception' 
after exactly 15 minutes of system inactivity.  This 
appears to be a problem in Solaris 10 01/06 (installed 
from Sun DVD p/n 708-0118-10), not OBP firmware version 
4.17.1.  The solution is to run this command under 
Solaris 10: 

# svcadm disable system/power:default

There appears to be an incompatibility between Solaris 10 
01/06 power management and the mainboard in this Sun Blade 
100, Sun part number 375-0096. 
 
Thanks to Filo Smith who wrote: 
> Gotta be power management then. 
Filo's email caused me to redouble my efforts to find a 
solution that involved power management. 
Simply killing the powerd process did not work. 
Renaming powerd so the system could not find it at reboot 
did not work: 
mv /usr/lib/power/powerd /usr/lib/power/powerd-DISABLED
Downgrading the OBP firmware version from 4.17.1 back 
to the original version 4.0.45 (followed by the 
set-defaults command) did not work. 
Reinstalling Solaris 9 09/04 did not work. 
Swapping components with another Sun Blade 100 revealed 
the 'RED State Exception' problem was resident on the 
mainboard, but it stubbornly refused to clear itself. 
It should be noted that the Sun Blade 100 does have two 
batteries on the mainboard.  -One inside the old style 
(large) IDPROM chip, and a second lithium CR2032 battery. 
I did pull the IDPROM chip off the mainboard and pulled 
the CR2032 battery and waited some time, hoping the 
errant power management settings would be forgotten by 
the mainboard, but this did not work either. 
In the end, I upgraded the OBP firmware back to version 
4.17.1 again, installed Solaris 10 01/06 again, then just 
ran the command: 
# svcadm disable system/power:default
-A simple solution, but not the course of action I took 
the first time the problem appeared.  My years of 
experience told me to put the machine back in it's 
original state when the problem arose (old OBP version, 
old Solaris version), but this time that was not the 
correct action to take.  It appears Solaris 10 01/06 
broke something, and only by using Solaris 10 could I 
fix it.  Since this machine is being used as a server 
with no display/keyboard/mouse, power management needs 
to be disabled anyway, so having power management 
disabled is not an issue for this machine. 
While most 'RED State Exception' errors are solved by 
finding and replacing defective hardware, this was not 
the case this time. 
 
Thanks very much to all the Sun Managers who took time 
to email me their ideas and experiences to help me solve 
this problem. 
 
For anyone seeing the same problem in the future, I'll 
throw in a bit more info to help the search engines 
find this email. 
On ttya, the system drops to the OK prompt with these 
messages: 

RED State Exception

TL=0000.0000.0000.0005  TT=0000.0000.0000.0064 
   TPC=ffff.ffff.d6ca.2bfc  TnPC=ffff.ffff.9100.726c  TSTATE=0000.0099.5800.1505 
TL=0000.0000.0000.0004  TT=0000.0000.0000.0010 
   TPC=0000.0000.0100.87fc  TnPC=0000.0000.0100.8800  TSTATE=0000.0099.5804.1405 
TL=0000.0000.0000.0003  TT=0000.0000.0000.0064 
   TPC=ffff.ffff.d6ca.2bfc  TnPC=ffff.ffff.9100.726c  TSTATE=0000.0044.5800.1505 
TL=0000.0000.0000.0002  TT=0000.0000.0000.0010 
   TPC=0000.0000.0100.0688  TnPC=0000.0000.0100.068c  TSTATE=0000.0044.5800.1505 
TL=0000.0000.0000.0001  TT=0000.0000.0000.0034 
   TPC=0000.0000.0104.0ad4  TnPC=0000.0000.0104.0ad8  TSTATE=0000.0044.0000.1605 

ERROR: error-reset-cleanup: Externally Initiated Reset has occurred.
     ERROR: Last Trap: Externally Initiated Reset

ok 


If input/output changed from ttya to keybd/screen, 
then these messages are printed on the screen: 

ok FATAL: no exception frames available, forcing misaligned trap
ok FATAL: no exception frames available, NESTED ERRORs, going interactive
(repeats several dozen times, then):
Rejecting alloc-mem!Rejecting alloc-mem!...(repeats)
 
Under Solaris 10, with power management disabled via: 
# svcadm disable system/power:default
and the system has been up more than 15 minutes, if 
you run this command, it locks the system immediately: 
# svcadm enable system/power:default
One would think 15 minutes of system inactivity would 
need to elapse before the system would crash after 
power management was re-enabled, but whatever power 
management timer (ACPI?) has already counted down to 
zero and this makes the system react with hair-trigger 
speed (immediately). 
 
Scott Mickey 
 

-------- Original Message --------
Subject: Sun Blade 100 - strange behavior after firmware update.
Date: Fri, 18 Aug 2006 12:52:37 -0600
From: Scott Mickey <mickey@denver.net>
To: sunmanagers@sunmanagers.org

Sun Managers, 
 
I updated the firmware on a Sun Blade 100, and now after 
exactly 15 minutes with the system idle, it drops to the 
ok prompt with these messages: 
 
> RED State Exception
> ERROR: error-reset-cleanup: Externally Initiated Reset has occurred.
> ERROR: Last Trap: Externally Initiated Reset
 
If booted single user mode, or if the system is kept busy, 
then this never happens.  System stays up indefinitely. 
 
Solaris 10 01/06 and Solaris 9 09/04 both install without 
error (as the machine is kept busy).  However, after OS 
installation is complete and machine goes idle, 15 minutes 
later the 'RED State Exception' happens and it drops to 
the ok prompt.  
 
Background info: 
This machine was very reliable and trouble free with 
original OBP firmware, version 4.0.45.  Ran Solaris 9, 
headless (no USB keybd or mouse, no monitor), with 2x 
80GB IDE disks, primarily as a jumpstart and SAMBA server. 
Idle nights and weekends, and sometimes extremely busy 
during work days.  -Never a crash, no errors, no problems. 
A good little machine. 
 
Upgraded to OBP firmware 4.17.1 using Sun patch 119235-01, 
dated Apr/29/2005.  Installed Solaris 10 from DVD without 
error, but then 'RED State Exception' happened. 
 
Downgraded OBP firmware back to 4.0.45 using patch 111179-01, 
and reinstalled Solaris 9, but 'RED State Exception' problem 
remained.  Again, only after 15 minutes of system inactivity 
at run-level 3 or run-level 2. 
 
Using parts from another Sun Blade 100, swapped memory, 
then CPU, then IDPROM chip, and then power supply.  
-Problem remained.  Put the mainboard (Sun p/n 375-0096) 
into another Sun Blade 100 chassis (this one had just one 
10 GB IDE drive), and did a Solaris 9 install.  -Problem 
remained.  The problem is on the mainboard, but it is 
NOT random.  I can tell within 30 seconds when the 
'RED State Exception' will occur, by running this script 
in a ssh window immediately after boot: 
 
$ cat show_uptime
#!/bin/sh - 
while :
do
  uptime
  sleep 60
done
 
Here is the output: 
$ ./show_uptime
 4:18pm up 1 min(s), 1 user, load average: 0.35, 0.15, 0.06
 4:19pm up 2 min(s), 1 user, load average: 0.14, 0.13, 0.05
 4:20pm up 3 min(s), 1 user, load average: 0.05, 0.11, 0.05
 4:21pm up 4 min(s), 1 user, load average: 0.02, 0.09, 0.05
 4:22pm up 5 min(s), 1 user, load average: 0.01, 0.07, 0.05
 4:23pm up 6 min(s), 1 user, load average: 0.00, 0.06, 0.04
 4:24pm up 7 min(s), 1 user, load average: 0.00, 0.05, 0.04
 4:25pm up 8 min(s), 1 user, load average: 0.00, 0.04, 0.04
 4:26pm up 9 min(s), 1 user, load average: 0.00, 0.03, 0.04
 4:27pm up 10 min(s), 1 user, load average: 0.00, 0.03, 0.04
 4:28pm up 11 min(s), 1 user, load average: 0.00, 0.02, 0.03
 4:29pm up 12 min(s), 1 user, load average: 0.00, 0.02, 0.03
 4:30pm up 13 min(s), 1 user, load average: 0.00, 0.02, 0.03
 4:31pm up 14 min(s), 1 user, load average: 0.00, 0.01, 0.03
 4:32pm up 15 min(s), 1 user, load average: 0.00, 0.01, 0.03
(Then RED State Exception and drops to ok prompt). 
 
In single user mode, system runs fine: 
# uptime
 6:34pm up 17:42, 0 users, load average: 0.00, 0.00, 0.01
 
Or if I open a second ssh window and run this script, 
it runs fine: 
$ cat find_usr
#!/bin/sh - 
while :
do
    find /usr -print
    sleep 5
done
 
I need to be honest and admit that neither Sun Blade 100 
has Sun-branded memory or Sun-branded hard disks. 
However, this isn't an enterprise-class machine by any 
stretch or measure, so that should not be a factor. 
The memory is good memory, as are the disks. 
I guess I could do another OBP firmware upgrade on 
another Sun Blade 100 to see if this is a repeatable 
error, but then I might have two useless Sun Blade 100's. 
 
Doing an OBP firmware upgrade and OS reinstall is a very 
common procedure.  I'm sure someone out there must have 
seen this problem also.  I know this machine is a FRU, 
but I would like to get it working again, rather than 
throw it in the recycle bin.  I look forward to your 
emails, with accounts of successful and unsuccessful 
Sun Blade 100 OBP firmware updates.  -Thanks! 
 
Oh, and why did I do an OBP firmware update in the first 
place?  I wanted to try out the OBP 'wanboot' feature, 
available only in OBP versions 4.17 and above. 
 
Also, if someone at Sun Microsystems could please forward 
this to the person or persons in-charge of OBP firmware 
for the Sun Blade 100/150 series, I would really appreciate 
it. 
 
Scott Mickey
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Tue Aug 22 12:13:32 2006

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:00 EST