SUMMARY[2]£şar_rput_dlpi error messages

From: Kun Li (likun@bjaimail.asiainfo.com)
Date: Wed Aug 26 1998 - 05:13:56 CDT


After my first summary , I received seveval valuable mail regarding this
problem.

thank the following people:
        Mark Sherman Frank Smith and Colin Melville

The conclusion:

1. I will do some repair with this disk and see if we need a new disk.

2. about the 'cpu6 panic' error, there is some messages excerpted from the
README
    of patch 103640-22:

   (from 103640-18)

    1234968 System Panic, ufs_ifree: freeing free inode, mode= %o, ino = %d,
fs = %s

   So, this problem should be fixed from patch level 103640-18 , but run
uname on my system:

   % uname -a
   SunOS www 5.5.1 Generic_103640-18 sun4u sparc SUNW,Ultra-Enterprise

   It's also patch level 103640-18 , why did i still get the similar error ?

3. about "ar_rput_dlpi" message , Mark sherman hit the nail on the head .
This error can be
    produced by trying to ifconfig an interface UP when it has no address,
or some other
    reason. In my case, this error always company with the hme0
configuration. I think it's
    the HA system takes some malfuntion.

The followings are the responses:
===========================
>From Mark Sherman:
the WARNing message is from the scsi driver, though it says warning,
what's really important is the Error level: Retryable, and did it recover ?
NO,
"ASC: 0x11 (unrecovered read error)". This read error occurred on sd0,
probably
your root disk device.

        the "ar_rput_dlpi: DL_???" paste refer to a problem with a network
interface, this error can be produced by trying to ifconfig an interface UP
when it has no address. there may be other reasons. Since this error
occurred
after the reboot, i'd check all of your configuration files to make certain
they are correct and intact.

        i'd replace the bad drive.

Mark Sherman

----------------------------
>From Frank Smith:
Look at the timestamps. The original message was a WARNING, telling
you a read attempt failed, but since the error level was retryable,
it tries a total of 3 (maybe 5) times, and if the retries succeed
then that's the end of it. Getting these warning messages usually
mean the disk is beginning to fail (or at least is in need of
reformatting since some drives have a problem with tracking changing
over time).
   The important message is the one logged a couple of seconds later,
the 'unrecovered read error'. That means all the read retries also
failed, so the system gave up and moved on without the data it wanted.
If the read was of a non-critical file it may just show up as odd
behaviour (if it was a man page, for example, the display would just
appear truncated, if it was your .cshrc you may just end up with a
shorter path or fewer aliases). Errors reading other files may cause
unexplained core dumps from applications or odd results from them.
   If it is a necessary system file (as yours evidently was, find
the file associated with the listed inode to see what it was. I think
the 'requested block' is the start of the file and the 'error block'
is the actual block that had the read failure), something like part
of the kernel or part of your swap space, it can cause a panic. Some
of these can be time-delayed, i.e. it errors reading a library, but
the actual function it needed from the library was ok, but later
another function is needed from the same library and the system
doesn't do a read because it thinks it has it in memory (or swap),
and bombs when tries to run the truncated library routine.
   
Good luck,

--
Frank Smith                              Voice: (512) 343-2002 x449
Systems Administrator                    Fax: (512)343-1717
TradeOne Marketing.                      Email: franks@tradeonemktg.com
11149 Research Blvd. Ste. 400            Web: www.tradeonemktg.com
Austin TX 78759-5227                       

---------------------------- >From Colin Melville:

If it paniced the system (as it appears it did from the syslog snippet you posted), then it's serious, IMHO.

Colin Melville Technology Partners

========================== And last the first summary as following:

one week ago , i sent a mail regarding some error messages excerpt from messages file. Now, there is one response from rashmi, here it is :

>kun li > what i suspect is your disk has gone bad since it is showing >media errors.What you can do is boot in single user mode and put the disk in >analize mode for read test or refresh test(format).If it is giving media >error on a single block try to repair it.If it is hardware error you have to >replace your disk. > >regards >rashmi

but it was only a WARNing message about media errors, so I don't know how serious this WARNING is . and I also want to know the relationship between these all messages and rebooting of the server. Can anyone tell me what's on earth the meaning of these error messages ?

any help will be greatly appreciated

likun@bjaimail.asiainfo.com Asiainfo Computer Network Co. ltd.

my original question: ================ Today, a customer of us report several system reboot. from the /var/adm/messages file, I excerpt the following error messages:

08:50:16 www unix: WARNING: /sbus@3,0/SUNW,fas@3,8800000/sd@0,0 (sd0): 08:50:18 www unix: Error for Command: read(10) Error Level: Retryable 08:50:18 www unix: Requested Block: 7698880 Error Block: 7698894 08:50:18 www unix: Vendor: SEAGATE Serial Number: 00713682 08:50:18 www unix: Sense Key: Media Error 08:50:18 www unix: ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0xd0

panic[cpu6]/thread=0x618164c0: free: freeing free block, dev=0x800007, block=13456, fs=/home1 17:10:30 www unix: syncing file systems...panic[cpu6]/thread=0x30037ec0: panic sync timeout 17:10:30 www unix: 6492 static and sysmap kernel pages

17:17:01 www unix: ar_rput_dlpi: DL_??? (7) failed, dl_errno 3,dl_unix_errno0 17:17:02 www unix: ar_rput_dlpi: DL_??? (7) failed, dl_errno 3,dl_unix_errno0 17:17:03 www unix: ar_rput_dlpi: DL_??? (7) failed, dl_errno 3,dl_unix_errno0 17:17:06 www last message repeated 3 times

is there any relative between these error messages ? what are they meaning ?

platform: sun E3000, Solaris 2.5.1

thanks in advices, will summarize.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= -=-= my->name = "likun"; my->email[0] = "likun@bjaimail.asiainfo.com"; my->email[1] = "likun@bjai.asiainfo.com"; -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= -=-=



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:46 CDT