Summary: error cannot stop cpu1

From: Richard Butler <rbutler_at_ibc.cnr.it>
Date: Wed Jan 15 2003 - 10:20:18 EST
The overwhelming consensus was definitely hardware - call Sun tech.
I am doing this immediately.

While I am waiting I will check out the other useful suggestions that I 
received - checking the memory seating and also patch 108528 which has 
to do with various panic situations. (The system was fully patched 
recently for recommended and security patches, but the problem could 
have started with the upgrade from 108528-16 to 108528-17. I might as 
well backout from 17 to 16 and see what happens ...).


Thanks to:
John E. Riddoch John.E.Riddoch@is.shell.com
Convey, Siomon simon.convey@csfb.com
Dominic Clarke dominicc@foe.co.uk
Christopher Wilkinson Christopher.Wilkinson@gfk.de
Justin Stringfellow js70062@ms-egmp02-01.UK.Sun.COM
Harrington, David B. David.Harrington@dscr.dla.mil
Mikes List mikelist@sky.net
Joe Fletcher joe.fletcher@btconnect.com
Hichael Morton mh1272@yahoo.com
Grieve, Shane SGrieve@templeinland.com
Ryan Bishop Ryan.Bishop@exim.gov
Doug Winter dwinter@icpeurope.net
David Beard beard@maths.adelaide.edu.au
tford tford@micron.com
and to others that have further suggestions.

Original message:
> Hi all,
> 
>    My (fairly) new Sunfire 280R Solaris 8 is crashing at least once a day with the /var/adm/messages errors below - not always the same except for "cannot stop cpu1". Although it looks to me like a hardware problem (cpu or RAM or something else?), I have some doubts because this only started after I had installed several continuously running applications.
> 
>    Should I be:
>       1) Calling my local Sun service now - definitely hardware
>       2) Stopping each application to see which is causing problems
>       3) Trying to understand crash dump files for more info.
> 
>  I appreciate your advice and will summarize.
> 
>        Richard
> 
> 
> 
> typical /var/adm/messages:
> 
> Jan 15 11:23:33 ed unix: [ID 350512 kern.notice] panic: failed to stop cpu1
> Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 862641 kern.warning] WARNING: [AFT1] Uncorrectable system bus (UE) Event detected by CPU0 Privileged Data Access at TL=0, errID 0x0000275a.fd4ac4d0
> Jan 15 11:23:33 ed     AFSR 0x00100004<PRIV,UE>.00000071 AFAR 0x00000000.f4679eb0
> Jan 15 11:23:33 ed     Fault_PC 0x10032084 Esynd 0x0071 J0100 J0202 J0304 J0406
> Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 364402 kern.notice] [AFT1] errID 0x0000275a.fd4ac4d0 Two Bits in error, likely from E$ WDU/CPU
> Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 180299 kern.info] [AFT2] errID 0x0000275a.fd4ac4d0 PA=0x00000000.f4679e80
> Jan 15 11:23:33 ed     E$tag 0x00000003.d1024124 E$state_2 Modified
> Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x63616c6c.6f75745f 0x7461736b.71000000 ECC 0x1fc
> Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x00000000.00000000 0x00000000.00000000 ECC 0x000
> Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x00000000.00000000 0x00000000.00000008 ECC 0x097
> Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 819380 kern.info] [AFT2] E$Data (0x30) 0xc0010000.00010000 0x00000001.00000002 ECC 0x069 *Bad* Esynd=0x071
> Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 929717 kern.info] [AFT2] D$ data not available
> Jan 15 11:23:33 ed SUNW,UltraSPARC-III+: [ID 291068 kern.warning] WARNING: [AFT1] EDU Event detected by CPU0 at TL=0, errID 0x0000275a.fd4ac4d0
> Jan 15 11:23:33 ed     AFSR 0x00000028<WDU,EDU>.00000071 AFAR 0x00000000.f4679eb0 AMBIGUOUS
> Jan 15 11:23:33 ed     Fault_PC 0x10032084 Esynd 0x0071 AMBIGUOUS
> Jan 15 11:23:34 ed SUNW,UltraSPARC-III+: [ID 856704 kern.notice] [AFT1] errID 0x0000275a.fd4ac4d0 Two Bits were in error
> Jan 15 11:23:34 ed unix: [ID 321153 kern.notice] NOTICE: Scheduling clearing of error on page 0x00000000.f4678000
> Jan 15 11:23:34 ed SUNW,UltraSPARC-III+: [ID 292220 kern.warning] WARNING: [AFT1] WDU Event detected by CPU0 at TL=0, errID 0x0000275a.fd4ac4d0
> Jan 15 11:23:34 ed     AFSR 0x00000028<WDU,EDU>.00000071 AFAR 0x00000000.f4679eb0 AMBIGUOUS
> Jan 15 11:23:34 ed     Fault_PC 0x10032084 Esynd 0x0071 AMBIGUOUS
> Jan 15 11:23:34 ed SUNW,UltraSPARC-III+: [ID 856704 kern.notice] [AFT1] errID 0x0000275a.fd4ac4d0 Two Bits were in error
> Jan 15 11:23:34 ed unix: [ID 321153 kern.notice] NOTICE: Scheduling clearing of error on page 0x00000000.f4678000
> Jan 15 11:23:35 ed unix: [ID 836849 kern.notice]
> Jan 15 11:23:35 ed ^Mpanic[cpu0]/thread=2a100045d20:
> Jan 15 11:23:35 ed unix: [ID 892114 kern.notice] [AFT1] errID 0x0000275a.fd4ac4d0 UE EDU WDU Error(s)
> Jan 15 11:23:35 ed     See previous message(s) for details
> Jan 15 11:23:35 ed unix: [ID 100000 kern.notice]
> Jan 15 11:23:35 ed genunix: [ID 723222 kern.notice] 000002a100044e90 SUNW,UltraSPARC-III+:cpu_aflt_log+560 (2a100044f4e, 1014bf08, 1014bee0, 0, 2a1000450d8, 2a100044f9b)
> Jan 15 11:23:35 ed genunix: [ID 179002 kern.notice]   %l0-3: 000002a100045540 000002a100045198 0000000000000003 0000000000000010
> Jan 15 11:23:35 ed   %l4-7: 00000300000658c8 0000030000e6dea8 0000000000000000 0000030000e6ded0
> Jan 15 11:23:36 ed genunix: [ID 723222 kern.notice] 000002a1000450e0 SUNW,UltraSPARC-III+:cpu_deferred_error+4d0 (400000000, 980c00000000, 1, 4010000403200071, 2a100045620, 4010000403200071)
> Jan 15 11:23:36 ed genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000001 000002a100045198 0000000000000000 0000000000000000
> Jan 15 11:23:36 ed   %l4-7: 0000000000000219 00000000f4679eb0 0000000000000000 000002a10001f910
> Jan 15 11:23:36 ed genunix: [ID 723222 kern.notice] 000002a100045570 unix:prom_rtt+0 (30000e75ea0, 2a100045d20, 1041c318, 10423a80, 2, 0)
> Jan 15 11:23:36 ed genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000005 0000000000001400 0000000000001604 000000001014185c
> Jan 15 11:23:36 ed   %l4-7: 0000000000000005 0000000000000004 000000000000000a 000002a100045620
> Jan 15 11:23:37 ed genunix: [ID 723222 kern.notice] 000002a1000456c0 genunix:taskq_dispatch+c (30000e75ea0, 100734b4, 300001f9000, 1, 30001d05ab0, 30000e75e80)
> Jan 15 11:23:37 ed genunix: [ID 179002 kern.notice]   %l0-3: 0000000010042270 0000000000000000 0000000000000000 000002a1000abd20
> Jan 15 11:23:37 ed   %l4-7: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Jan 15 11:23:37 ed genunix: [ID 723222 kern.notice] 000002a100045770 genunix:callout_schedule_1+a0 (300001f9000, 300001f9000, 20, 10000, 30000e75e3a, 30000e75e60)
> Jan 15 11:23:37 ed genunix: [ID 179002 kern.notice]   %l0-3: 00000000100734b4 0000030000e75e38 0000030000e75e30 0000030000e75e08
> Jan 15 11:23:37 ed   %l4-7: 0000030000e75e28 0000030000e75e60 00000300000658f0 0000000000000002
> Jan 15 11:23:38 ed genunix: [ID 723222 kern.notice] 000002a100045820 genunix:callout_schedule+54 (10439394, 1, 10439310, 8, 1, 30000162e70)
> Jan 15 11:23:38 ed genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000000 000002a1006d1ba0 000003000288dbd8 0000000000000000
> Jan 15 11:23:38 ed   %l4-7: 0000000000000000 00000300027a7568 0000000000000000 0000000000000000
> Jan 15 11:23:38 ed genunix: [ID 723222 kern.notice] 000002a1000458d0 genunix:clock+474 (1045d000, 1041b380, 1042e000, 325ffd4a80, 0, 0)
> Jan 15 11:23:38 ed genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000001 0000000000000001 000002a100071d20 0000000000000000
> Jan 15 11:23:38 ed   %l4-7: 000000001041b380 0000000000000016 000000001041bb40 000002a1006d1ba0
> Jan 15 11:23:39 ed genunix: [ID 723222 kern.notice] 000002a1000459a0 genunix:cyclic_softint+a4 (1041b380, 30000065928, 1, 3, 30000162dd0, 100746cc)
> Jan 15 11:23:39 ed genunix: [ID 179002 kern.notice]   %l0-3: 0000030000065930 000000000041e4c4 0000000000000000 0000030000162dd0
> Jan 15 11:23:39 ed   %l4-7: 00000300000658c8 0000030000e6dea8 0000000000000000 0000030000e6ded0
> Jan 15 11:23:39 ed genunix: [ID 723222 kern.notice] 000002a100045a60 unix:cbe_level10+8 (0, 803, 1041b380, 2a100045d20, 10060, 1000b2cc)
> Jan 15 11:23:39 ed genunix: [ID 179002 kern.notice]   %l0-3: 0000030000065930 0000000000010000 0000000000000000 0000030000162dd0
> Jan 15 11:23:39 ed   %l4-7: 00000300000658c8 0000030000e6dea8 0000000000000000 0000030000e6ded0
> Jan 15 11:23:40 ed unix: [ID 100000 kern.notice]
> Jan 15 11:23:40 ed genunix: [ID 672855 kern.notice] syncing file systems...
> Jan 15 11:23:40 ed genunix: [ID 904073 kern.notice]  done
> Jan 15 11:23:41 ed genunix: [ID 353387 kern.notice] dumping to /dev/dsk/c1t0d0s1, offset 859701248
> Jan 15 11:24:00 ed genunix: [ID 409368 kern.notice] ^M100% done: 40395 pages dumped, compression ratio 4.30,
> Jan 15 11:24:00 ed genunix: [ID 851671 kern.notice] dump succeeded
> 
> followed by typical reboot sequence.
> 



====================================================================
Richard Butler
Cell Biology Institute, C.N.R.                 tel: +39-06-90091-265
viale E.Ramarini, 32                           fax: +39-06-90091-260
Monterotondo Scalo (Roma)
I-00016 Italy                               email:rbutler@ibc.cnr.it
====================================================================
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Wed Jan 15 10:29:29 2003

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:01 EST