SUMMARY: Urgent help for panicing E420, followup questions too.

From: Bevan Broun <brounb_at_adi-limited.com>
Date: Thu Jul 05 2001 - 19:15:06 EDT
My message below didnt make it to the list it seems. This summary serves a
dual purpose, test to list and summary.

Machine was panicing after a second or two in single user mode.

The problem was that /var had become corrupted and was not getting fscked
after each crash. This was due to the "logging" option in vfstab

/dev/md/dsk/d0  /dev/md/rdsk/d0 /       ufs     1       no      logging
/dev/md/dsk/d20 /dev/md/rdsk/d20        /var    ufs     1       no logging
/dev/md/dsk/d30 /dev/md/rdsk/d30        /opt    ufs     2       yes logging
/dev/md/dsk/d40 /dev/md/rdsk/d40        /export ufs     2       yes logging

This was done by a sun engineer! (its not my box to look after). This
option has now been removed.

The machine was only staying up for seconds before panicing. The soln to
the problem was to edit /etc/vfstab and remove the option but I could not
do this with only a second or so before panics (I didnt want corrupt / -
actually that might of help).

I had to 
unmirror root and make the system use the plain partition as root.
boot from cdrom, fsck the root partition, mount and edit the etc/vfstab to
remove the loggin option
boot again - machine panics and this time fscks. ( actually it didnt quite
work that way we removed "logging" instead of replacing it with a "-" , I
got to a system with /var unmounted)

The commands needed were
metadetach d0 d1

metaroot /dev/dsk/c0t0d0s0 
NOTE:this command needed to be run serveral times as
the changes were not being written to disk before the next panic. We kept
booting until we saw that root was indeed getting mounted on /dev/dsk/c0t0d0s0

ok boot cdrom -s 
NOTE: actually we just quit out of the install, but Ive since
been told that this would have got me to the prompt a little quicker

fsck  /dev/dsk/c0t0d0s0
mount /dev/dsk/c0t0d0s0 /a
vi /a/etc/vfstab
shutdown -y -g0 -i0

ok boot -s
fsck /var
fsck /opt

DONE

Followup questions.
Could I have forced a boot which would have only mounted / from the ok
promopt?
Could I have forced a boot which would have fscked all filesystems?

I wish my message had got thru!

BB

on Wed, Jul 04, 2001 at 07:33:10PM +1000, Bevan Broun <brounb@adi-limited.com> wrote:
> Hello managers
> 
> Bit of an urgent problem, hope you can help.
> 
> An E420R keeps panicing. It will only stay up for a few minutes in single
> user mode, seems like less in multi user mode.
> 
> Im suspecting either fs corruption on / or /var or faulty disks for these
> partitions. The system uses disksuite and has mirrored internal disks for
> filesystems /, /var and /opt. The external disks are configured as a raid5
> but these are not my concern at the moment - Ill fsck them once I have a
> stable single user system. Some more presise info:
> 
> root@osiris # metastat -p
> d0 -m d1 d2 1
> d1 1 1 c0t0d0s0
> d2 1 1 c0t1d0s0
> d10 -m d11 d12 1
> d11 1 1 c0t0d0s1
> d12 1 1 c0t1d0s1
> d20 -m d21 d22 1
> d21 1 1 c0t0d0s3
> d22 1 1 c0t1d0s3
> d30 -m d31 d32 1
> d31 1 1 c0t0d0s5
> d32 1 1 c0t1d0s5
> d40 -r c1t0d0s0 c1t2d0s0 c1t3d0s0 c1t4d0s0 c1t5d0s0 c2t8d0s0 c2t9d0s0
> c2t10d0s0
> c2t11d0s0 c2t12d0s0 -k -i 32b
> hsp001 c2t13d0s2 c1t1d0s2
> 
> root@osiris # cat /etc/vfstab
> fd      -       /dev/fd fd      -       no      -
> /proc   -       /proc   proc    -       no      -
> /dev/md/dsk/d10 -       -       swap    -       no      -
> /dev/md/dsk/d0  /dev/md/rdsk/d0 /       ufs     1       no      logging
> /dev/md/dsk/d20 /dev/md/rdsk/d20        /var    ufs     1       no logging
> /dev/md/dsk/d30 /dev/md/rdsk/d30        /opt    ufs     2       yes logging
> /dev/md/dsk/d40 /dev/md/rdsk/d40        /export ufs     2       yes logging
> swap    -       /tmp    tmpfs   -       yes     -
> 
> When the system panics it looks like this (lines have been wrapped) :
> 
> root@osiris # ls l*
> panic[cpu1]/thread=2a100377d40: free: freeing free frag, dev:0x5500000014,
> blk:2
> 63, cg:30, ino:247723, fs:/var
> 
> 000002a1003772e0 ufs:real_panic_v+70 (0, 10466000, 2a100377580, 0,
> 5eec5c00, 300
> 01969940)
>   %l0-3: 0000000000003b10 0000030000182000 0000030001969a30
>   %0000000010009c78
>   %l4-7: 0000000000000010 00000300002fb358 0000000000000000
>   %000002a10001f950
> 000002a100377390 ufs:ufs_fault_v+48 (2a100377748, 10466000, 2a100377580,
> 2a10037
> 7748, 5b, 10466000)
>   %l0-3: 000000005eec6000 0000030000182000 0000000000000400
>   %0000030001969940
>   %l4-7: 000000005eec6000 000003000000e908 00000300025c5e20
>   %00000000002f9250
> 000002a100377440 ufs:ufs_fault+1c (2a100377748, 10466000, 5500000014, 107,
> 1e, 3
> c7ab)
>   %l0-3: 000000000000c908 000000001045a000 0000030001969bc0
>   %000000005f24d600
>   %l4-7: 0000000000000080 0000000000000080 000000005f24d600
>   %0000000000000000
> 000002a1003774f0 ufs:free+498 (400, 300026640a8, 100, 1e, 30002664688,
> 300026640
> 34)
>   %l0-3: 0000030002664668 00000300025c5e20 0000030001102540
>   %000002a1003776b8
>   %l4-7: 0000000000000010 00000300019ac000 0000030002664000
>   %0000000000000107
> 000002a100377600 ufs:ufs_itrunc+734 (ffffffffffffffff, 400, 10, f,
> fffffffffffff
> fff, b)
>   %l0-3: 000002a1003776b8 00000300019ac000 0000000000000000
>   %000003000294b490
>   %l4-7: 000003000080df28 0000000000000000 ffffffffffffffff
>   %0000000000000000
> 000002a1003778f0 ufs:ufs_trans_itrunc+1bc (ffbf, 40, 0, 30001102588,
> 3000294b5d0
> , 300019ac000)
>   %l0-3: 000000001033e060 0000000000000000 0000000000000001
>   %0000030001102540
>   %l4-7: 0000000000000000 000003000080df28 0000000000000000
>   %000003000294b490
> 000002a1003779b0 ufs:ufs_delete+e4 (30001102540, 3000294b490, 1, 0,
> 3000294b520,
>  3000294b490)
>   %l0-3: 000002a100377b4a 0000030001102540 0000000000000001
>   %0000000000002270
>   %l4-7: 0000030ffffd5e68 0000000000000000 0000000000000000
>   %000002a10001f950
> 000002a100377a80 ufs:ufs_thread_delete+c4 (3000021d7c8, 0, 10423840,
> 30001102540
> , 300011025b0, 0)
>   %l0-3: 0000030001102590 000003000294b490 0000000000000004
>   %000002a10001fd40
>   %l4-7: 0000000000000000 00000300001897b8 0000000000000000
>   %000002a10001fa00
> 
> syncing file systems... [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] cannot sync
> -- g
> iving up
> dumping to /dev/md/dsk/d10, offset 429916160
> WARNING: md: d21: write error on /dev/dsk/c0t0d0s3
> WARNING: md: d2: write error on /dev/dsk/c0t1d0s0
> 100% done: 10604 pages dumped, compression ratio 6.79, dump succeeded
> rebooting...
> Resetting ...
> 
> screen not found.
> Can't open input device.
> Keyboard not present.  Using ttya for input and output.
> 
> Sun Enterprise 420R (2 X UltraSPARC-II 450MHz), No Keyboard
> OpenBoot 3.29, 1024 MB memory installed, Serial #15277280.
> Ethernet address 8:0:20:e9:1c:e0, Host ID: 80e91ce0.
> 
> Initializing Memory -
> 
> Ive got screen output from a few panics,
> 
> brounb@edward>grep 'WARNING: md:' notes.txt
> WARNING: md: d1: write error on /dev/dsk/c0t0d0s0
> WARNING: md: d21: read error on /dev/dsk/c0t0d0s3
> WARNING: md: d21: write error on /dev/dsk/c0t0d0s3
> WARNING: md: d1: read error on /dev/dsk/c0t0d0s0
> WARNING: md: d21: write error on /dev/dsk/c0t0d0s3
> WARNING: md: d2: write error on /dev/dsk/c0t1d0s0
> 
> I did a boot from CD and the system behaved itself. Im off to try and do a
> ufsdump.
> 
> I have the luxury of a spare (new) E420 and spare (new) drives. Im trying
> to figure the fastest and saftest way to have a working system for the
> users tomorrow.
> 
> I want to know if Im dealing with dying disks, filesystem corruption in /
> or /var, dying HD controller or something else. Any other useful advise is,
> ofcourse, very  welcome.
> 
> TIA
> 
> Bevan Broun
Received on Fri Jul 6 00:15:06 2001

This archive was generated by hypermail 2.1.8 : Wed Mar 23 2016 - 16:24:58 EDT