SUMMARY: ecache parity error?

From: <Dan_Kelley_at_ssmhc.com>
Date: Thu Apr 25 2002 - 09:47:51 EDT
Thank you very much to all who responded:
Jed Dobson <jed@wgtech.com>
"Miller Sutfin" <millersutfin@earthlink.net>
Ray Ballisti <ballisti@ifh.ee.ethz.ch>
Aleksey Tsalolikhin <eesti@corp.earthlink.net>
crpollino@e-milio.com
"Craig Scott" <craig.scott@stc.ac.uk>
"JULIAN, JOHN C (AIT)" <jj2195@sbc.com>
Eric Priebe <epriebe@ACUS.com>


I received mixed results as to what it was exactly.  Some say the ecache 
parity error does not affect the U10s, some say it is the ecache parity 
error.  A few were nice enough to point out that EDP means Ecache Data 
Parity, so whatever the issue, it is with the ecache.

Ray Ballisti <ballisti@ifh.ee.ethz.ch> suggested running the POST 
diagnostic routine, which actually came up clean in this case.  Thanks for 
the suggestion, though!

The overwhelming general consensus was to that the CPU needs to be 
replaced.

Thank you to everyone who responed, as well as anyone who may be 
responding as I type this!

 - Dan



ORIGINAL MESSAGE:


Hello, all.

We have a machine that keeps crashing, and I think it is the ecache parity 
error.  I have been waiting for it to happen again before I sent an e-mail 
to this list, though.  Could anyone look at this and tell me if they think 
it is the ecache error?  If not, any clues as to what it is?  Thanks in 
advance!  I will summarize.

 - Dan


uname -a:
SunOS netdev 5.8 Generic_108528-14 sun4u sparc SUNW,Ultra-5_10

I have tracked here is the info for the first one (note they are slightly 
different):

echo '$c' | adb -k unix.1 vmcore.1:

physmem 173a7
panicsys(104234b0,1040c198,10050068,78002000,57542400,c) + 44
vpanic(10050068,1040c198,16e76a3d8cac,10,30000689ea8,30000068438) + cc
panic(10050068,804,1,1041a798,fffd,20) + 1c
sync_handler(1041a980,10400000,0,0,0,2) + 150
prom_rtt(10000000,16,f0000000,16e7332a6da9,0,2)
client_handler(f0066d2c,2a10007d6e8,1,104283d8,1,1041a980) + 2c
prom_enter_mon(0,6,b,2a10004bd40,2a10007dd40,0) + 28
debug_enter(0,16e73315c8c5,16e73315c8c9,0,30000ddf1e8,0) + d0
kbdinput(1045a400,4d,30000689d68,300001b5000,0,1013dd4c) + 304
kbdrput(30000adabe8,30000f7e340,30000ad3a98,30000f7e340,30000689d68,30000ad3a20) 
+ 13c
putnext(30000adae48,30000ad9a90,30000adb0a8,30000f7e340,0,0) + 1cc
async_softint(30000f7e340,1,ffff,20000,0,30000adae48) + 568
asysoftintr(3000017a008,30000b7e000,1,2a10007dd40,10180,1026fba8) + 70
intr_thread(2a10001fd40,1041b180,10423890,10423890,0,0) + a4
idle(1040f864,0,0,1041b180,3000005d6c8,0) + 54
thread_start(0,0,0,0,0,0) + 4

/var/adm/messages from this one:

Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 932869 kern.warning] 
WARNING: [AFT1] EDP event on CPU0 Data access at TL=0, errID 
0x00015289.afcae2ba
Apr 12 17:59:18 netdev     AFSR 0x00000000.80400080<PRIV,EDP> AFAR 
0x00000000.3d41fa68
Apr 12 17:59:18 netdev     AFSR.PSYND 0x0080(Score 95) AFSR.ETS 0x00 
Fault_PC 0x10031cc8
Apr 12 17:59:18 netdev     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 
UDBL.ESYND 0x00
Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 683009 kern.info] [AFT2] 
errID 0x00015289.afcae2ba PA=0x00000000.3d41fa68
Apr 12 17:59:18 netdev     E$tag 0x00000000.0003cf50 E$State: Modified 
E$parity 0x03 Badlines found=6
Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] 
E$Data (0x00): 0x00000000.10041eb0
Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] 
E$Data (0x08): 0x00000000.10041eb4
Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] 
E$Data (0x10): 0x00000000.0247e008
Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] 
E$Data (0x18): 0x00000000.10423890
Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] 
E$Data (0x20): 0x00000000.10041eb0
Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 989652 kern.info] [AFT2] 
E$Data (0x28): 0x80000000.00000000 *Bad* PSYND=0x0080
Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] 
E$Data (0x30): 0x00000000.00000000
Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] 
E$Data (0x38): 0x000002a1.000b7d20
Apr 12 17:59:18 netdev SUNW,UltraSPARC-IIi: [ID 601312 kern.info] [AFT2] 
errID 0x00015289.afcae2ba AFAR was derived from E$Tag
Apr 12 17:59:18 netdev unix: [ID 836849 kern.notice] 
Apr 12 17:59:18 netdev ^Mpanic[cpu0]/thread=2a10007dd20: 
Apr 12 17:59:18 netdev unix: [ID 455523 kern.notice] [AFT1] errID 
0x00015289.afcae2ba EDP Error(s)
Apr 12 17:59:18 netdev     See previous message(s) for details
Apr 12 17:59:18 netdev unix: [ID 100000 kern.notice] 
Apr 12 17:59:18 netdev genunix: [ID 723222 kern.notice] 000002a10007d200 
SUNW,UltraSPARC-IIi:cpu_aflt_log+4e0 (2a10007d2be, 1, 101483a0, 
2a10007d448, 2a10007d30b, 101483c8)
Apr 12 17:59:19 netdev genunix: [ID 179002 kern.notice]   %l0-3: 
0000000000000000 000002a10007d510 0000000000000003 0000000000000010
Apr 12 17:59:19 netdev   %l4-7: 0000000000200000 0000000000400000 
0000000000000000 000002a10001f9c0
Apr 12 17:59:19 netdev genunix: [ID 723222 kern.notice] 000002a10007d450 
SUNW,UltraSPARC-IIi:cpu_async_error+868 (1, 2a10007d510, 80400080, 0, 
640000080400080, 2a10007d6d0)
Apr 12 17:59:19 netdev genunix: [ID 179002 kern.notice]   %l0-3: 
0000000000000001 0000000000000032 0000000000000000 0000000000000000
Apr 12 17:59:19 netdev   %l4-7: 0000000000000219 0000000000000000 
000003000005d748 0000000000000000
Apr 12 17:59:19 netdev genunix: [ID 723222 kern.notice] 000002a10007d620 
unix:prom_rtt+0 (300001b2000, 8000000000000000, a, a, 0, 0)
Apr 12 17:59:19 netdev genunix: [ID 179002 kern.notice]   %l0-3: 
0000000000000001 0000000000001400 0000000000001600 000000001013fb54
Apr 12 17:59:19 netdev   %l4-7: 0000030000697ea0 0000000000000001 
000000000000000a 000002a10007d6d0
Apr 12 17:59:19 netdev genunix: [ID 723222 kern.notice] 000002a10007d770 
genunix:callout_schedule_1+4 (300001b2000, 10443508, 300001b5000, 
10072cf4, 0, 101424b0)
Apr 12 17:59:20 netdev genunix: [ID 179002 kern.notice]   %l0-3: 
0000000000000008 0000000000000002 0000000000000001 000000001041b718
Apr 12 17:59:20 netdev   %l4-7: 000000001041b338 0000000000000016 
000000001041baf8 000002a10007d7b0
Apr 12 17:59:20 netdev genunix: [ID 723222 kern.notice] 000002a10007d820 
genunix:callout_schedule+54 (104391fc, 1, 10439178, 8, 1, 300000683c8)
Apr 12 17:59:20 netdev genunix: [ID 179002 kern.notice]   %l0-3: 
00000000100d312c 0000030000cec000 0000030000d79602 0000030000cec000
Apr 12 17:59:20 netdev   %l4-7: 000003000188f040 0000000000000000 
000003000148af00 000002a10051dba0
Apr 12 17:59:20 netdev genunix: [ID 723222 kern.notice] 000002a10007d8d0 
genunix:clock+474 (1045a800, 1041b338, 1042dc00, 94f476874837, 0, 0)
Apr 12 17:59:20 netdev genunix: [ID 179002 kern.notice]   %l0-3: 
0000000000000000 0000000000000001 000002a10007dd20 0000000000000000
Apr 12 17:59:20 netdev   %l4-7: 000000001045a000 000000003b9aca00 
000000001041baf8 00000000fed3a004
Apr 12 17:59:20 netdev genunix: [ID 723222 kern.notice] 000002a10007d9a0 
genunix:cyclic_softint+a4 (1041b338, 30000057928, 1, 3, 30000068478, 
10073f0c)
Apr 12 17:59:20 netdev genunix: [ID 179002 kern.notice]   %l0-3: 
0000030000057930 800000000237f894 0000000000000000 0000030000068478
Apr 12 17:59:20 netdev   %l4-7: 00000300000578c8 000003000068dea8 
0000000000000000 000003000068ded0
Apr 12 17:59:21 netdev genunix: [ID 723222 kern.notice] 000002a10007da60 
unix:cbe_level10+8 (0, 803, 1041b338, 2a10007dd20, 10060, 1000b34c)
Apr 12 17:59:21 netdev genunix: [ID 179002 kern.notice]   %l0-3: 
00000000102e4934 0000000000000001 0000000000000001 0000030000070ed8
Apr 12 17:59:21 netdev   %l4-7: 0000000000000000 0000000000000000 
0000000000000000 0000000000000000
Apr 12 17:59:21 netdev unix: [ID 100000 kern.notice] 
Apr 12 17:59:21 netdev genunix: [ID 672855 kern.notice] syncing file 
systems...
Apr 12 17:59:21 netdev genunix: [ID 904073 kern.notice]  done
Apr 12 17:59:22 netdev genunix: [ID 353387 kern.notice] dumping to 
/dev/dsk/c0t0d0s1, offset 322174976
Apr 12 17:59:22 netdev uata: [ID 606412 kern.warning] WARNING: timeout: 
reset bus chno = 0 targ = 0
Apr 12 17:59:38 netdev genunix: [ID 409368 kern.notice] ^M100% done: 8116 
pages dumped, compression ratio 3.96, 
Apr 12 17:59:38 netdev genunix: [ID 851671 kern.notice] dump succeeded


And now for the second crash:

echo '$c' | adb -k unix.0 vmcore.0:

physmem 173a7
panicsys(104234b0,1040c198,10050068,78002000,39ff00,c) + 44
vpanic(10050068,1040c198,faabfb648,10,30000689ea8,30000068438) + cc
panic(10050068,804,1,1041a798,fffd,20) + 1c
sync_handler(1041a980,10400000,0,0,0,2) + 150
prom_rtt(10000000,16,f0000000,f810ca9c6,0,2)
client_handler(f0066d2c,2a10007d6e8,1,104283d8,1,1041a980) + 2c
prom_enter_mon(0,6,b,2a10004bd40,2a10007dd40,0) + 28
debug_enter(0,f80db6987,f80db698a,0,30001092020,0) + d0
kbdinput(1045a400,4d,30000689d68,300001b5000,0,1013dd4c) + 304
kbdrput(30000adabe8,3000108f080,30000ad3a18,3000108f080,30000689d68,30000ad39a0) 
+ 13c
putnext(30000adae48,30000ad9a90,30000adb0a8,3000108f080,0,0) + 1cc
async_softint(3000108f080,1,ffff,20000,0,30000adae48) + 568
asysoftintr(3000017a008,30000b7e000,1,2a10007dd40,10180,1026fba8) + 70
intr_thread(2a10001fd40,1041b180,10423890,10423890,0,0) + a4
idle(1040f864,0,0,1041b180,3000005d6c8,0) + 54
thread_start(0,0,0,0,0,0) + 4


/var/adm/messages leading up to the reboot:

Apr 24 12:20:07 netdev SUNW,UltraSPARC-IIi: [ID 370172 kern.warning] 
WARNING: [AFT1] EDP event on CPU0 Instruction access at TL=0, errID 
0x0001d01e.baad443a
Apr 24 12:20:07 netdev     AFSR 0x00000000.004000f0<EDP> AFAR 
0xffffffff.ffffffff
Apr 24 12:20:07 netdev     AFSR.PSYND 0x00f0(Score 45) AFSR.ETS 0x00 
Fault_PC 0x97560
Apr 24 12:20:07 netdev     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 
UDBL.ESYND 0x00
Apr 24 12:20:07 netdev SUNW,UltraSPARC-IIi: [ID 798591 kern.info] [AFT2] 
errID 0x0001d01e.baad443a No error found in ecache (No fault PA available)
Apr 24 12:20:07 netdev unix: [ID 836849 kern.notice] 
Apr 24 12:20:07 netdev ^Mpanic[cpu0]/thread=3000165a440: 
Apr 24 12:20:07 netdev unix: [ID 424580 kern.notice] [AFT1] errID 
0x0001d01e.baad443a EDP Error(s)
Apr 24 12:20:07 netdev     See previous message(s) for details
Apr 24 12:20:08 netdev unix: [ID 100000 kern.notice] 
Apr 24 12:20:08 netdev genunix: [ID 723222 kern.notice] 000002a1005dd6d0 
SUNW,UltraSPARC-IIi:cpu_aflt_log+4e0 (2a1005dd78e, 1, 101483a0, 
2a1005dd918, 2a1005dd7db, 101483c8)
Apr 24 12:20:08 netdev genunix: [ID 179002 kern.notice]   %l0-3: 
0000000000000000 000002a1005dd9e0 0000000000000003 0000000000000010
Apr 24 12:20:08 netdev   %l4-7: 0000000000200000 0000000000400000 
0000000000000001 0000000000000080
Apr 24 12:20:08 netdev genunix: [ID 723222 kern.notice] 000002a1005dd920 
SUNW,UltraSPARC-IIi:cpu_async_error+868 (1, 2a1005dd9e0, 4000f0, 0, 
1400000004000f0, 2a1005ddba0)
Apr 24 12:20:08 netdev genunix: [ID 179002 kern.notice]   %l0-3: 
0000000000000001 000000000000000a 0000000000000000 0000000000000000
Apr 24 12:20:08 netdev   %l4-7: 0000000000004208 0000000000000000 
00000000007fbdd0 0000000000000084
Apr 24 12:20:08 netdev unix: [ID 100000 kern.notice] 
Apr 24 12:20:08 netdev genunix: [ID 672855 kern.notice] syncing file 
systems...
Apr 24 12:20:09 netdev genunix: [ID 733762 kern.notice]  1
Apr 24 12:20:10 netdev genunix: [ID 904073 kern.notice]  done
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Thu Apr 25 09:53:24 2002

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:41 EST