Summary: help! picld error - is it a hardware issue?

From: ktn <ktn_at_dodo.com.au>
Date: Wed Jun 16 2004 - 20:17:26 EDT
Thank you John Benjamins, Joe Fletcher, Doug Hughes, Ayaz Anjum, Julio
Carrasco and others I might has missed out for the responses. Everyone has
been very helpful! :)

It was a patch thing in the end (I think). I decided to power off the server
and try doing a reconfigure boot again (in diagnostics mode in fact, to find
the fans responding ok). Sun initially thought it's a failed CPU fan when
the two patches didn't work. But looking at the log during booting, there
were lots of picld errors from the start, things like "no such file or
directory", "error running psvc_fan_fault_check_policy_0" on lots of
thing..makes it sound like a dodgy boot in the first place.

Sun recommended that sometimes powering off the server will help, before
doing a boot -r. Now that I think about it, I have shutdown to single user
mode, did the patch, before doing a reconfigure boot.

Many thanks once again!
Kath

PS: Most responses I have applied to Sol8 rather than 9, but I attach them
here anyways.

Original Q:

>
> Dear managers,
>
> Some updates, it seems that John Benjamins had a similar problem in
Solaris
> 8...I don't see any similar patches for v880s with Solaris 9 though, other
> than 113447-17 and 113573 that Sun pointed. With these two patches I can
see
> my memory information now, but prtdiag -v shows the following unusual
> environmental status (similar to that mentioned in
>
http://sunportal.sunmanagers.org/pipermail/summaries/2003-June/004000.html):
> as well as the same console errors by picld as mentioned before.
>
>
> Fan Bank :
> ----------
>
> Bank                        Speed         Status        Fan State
> ( RPMS )
> ----                       --------      ---------      ---------
> CPU0_PRIM_FAN   failed in picl_get_propval_by_name for fan speed
> General system failure
>
> I guess I'll wait for more updated patches from Sun now for Solaris 9.
> Thanks Joe Fletcher for pointing out for me to look for picld patches.
Very
> flaky indeed.
>
> Oh, 113573-05 recommends installing patches 113574-07 in the (I assume)
> latest README, but the latter patch has been withdrawn. Oh well.
>
>
> Original Q:
>
> &gt;
> &gt; Dear managers, need your prompt help!
> &gt;
> &gt; I've been getting these errors in /var/adm/messages constantly since
a
> &gt; reboot a machine, a Sunfire v880 running Solaris 9 Generic_112233-12
(due
> to
> &gt; a power failure by the way) --
> &gt;
> &gt; ....
> &gt; Jun 11 03:12:12 serv picld[93]: [ID 710302 daemon.error] I/O error
> &gt; Jun 11 03:12:13 serv picld[93]: [ID 478985 daemon.error] ERROR
running
> &gt; psvc_fan_fault_check_policy_0 on CPU1_PRIM_FAN (249
> &gt; 9992)
> &gt; Jun 11 03:12:13 serv picld[93]: [ID 710302 daemon.error] I/O error
> &gt; Jun 11 03:12:15 serv picld[93]: [ID 478985 daemon.error] ERROR
running
> &gt; psvc_fan_fault_check_policy_0 on IO_BRIDGE_PRIM_FAN
> &gt; (2500216)
> &gt; Jun 11 03:12:15 serv picld[93]: [ID 710302 daemon.error] I/O error
> &gt; Jun 11 03:12:48 serv picld[93]: [ID 478985 daemon.error] ERROR
running
> &gt; psvc_fan_fault_check_policy_0 on CPU0_PRIM_FAN (249
> &gt; 9960)
> &gt; Jun 11 03:12:48 serv picld[93]: [ID 710302 daemon.error] I/O error
> &gt; Jun 11 03:12:49 serv picld[93]: [ID 478985 daemon.error] ERROR
running
> &gt; psvc_fan_fault_check_policy_0 on CPU1_PRIM_FAN (249
> &gt; 9992)
> &gt; Jun 11 03:12:49 serv picld[93]: [ID 710302 daemon.error] I/O error
> &gt; Jun 11 03:12:51 serv picld[93]: [ID 478985 daemon.error] ERROR
running
> &gt; psvc_fan_fault_check_policy_0 on IO_BRIDGE_PRIM_FAN
> &gt; (2500216)
> &gt; ....
> &gt;
> &gt;
> &gt; In the logs during the reboot, the &amp;quot;PS2 Device
unplugged&amp;quot; is the
> last error
> &gt; picld gives...could this be a cause of the problem? --
> &gt; ....
> &gt; May 30 20:37:57 serv eri: [ID 517527 kern.info] SUNW,eri0 : 100 Mbps
full
> &gt; duplex link up
> &gt; May 30 20:38:00 serv last message repeated 1 time
> &gt; May 30 20:38:02 serv pseudo: [ID 129642 kern.info] pseudo-device:
devinfo0
> &gt; May 30 20:38:02 serv genunix: [ID 936769 kern.info] devinfo0 is
> &gt; /pseudo/devinfo@0
> &gt; May 30 20:42:23 serv picld[93]: [ID 293134 daemon.error] Device PS2
> &gt; unplugged
> &gt; May 30 20:42:50 serv fsck[164]: [ID 293258 user.error] libsldap:
Status: 2
> &gt; Mesg: Unable to load configuration '/var/ldap/
> &gt; ldap_client_file' ('').
> &gt; May 30 20:42:50 serv last message repeated 3 times
> &gt; May 30 20:42:50 serv picld[93]: [ID 478985 daemon.error] ERROR
running
> &gt; psvc_fan_fault_check_policy_0 on CPU0_PRIM_FAN (249
> &gt; 9960)
> &gt; May 30 20:42:50 serv picld[93]: [ID 875627 daemon.error] No such file
or
> &gt; directory
> &gt; May 30 20:42:51 serv fsck[164]: [ID 293258 user.error] libsldap:
Status: 2
> &gt; Mesg: Unable to load configuration '/var/ldap/
> &gt; ldap_client_file' ('').
> &gt; May 30 20:42:51 serv last message repeated 5 times
> &gt; May 30 20:42:52 serv picld[93]: [ID 478985 daemon.error] ERROR
running
> &gt; psvc_fan_fault_check_policy_0 on CPU1_PRIM_FAN (249
> &gt; 9992)
> &gt; May 30 20:42:52 serv picld[93]: [ID 875627 daemon.error] No such file
or
> &gt; directory
> &gt; May 30 20:42:53 serv fsck[164]: [ID 293258 user.error] libsldap:
Status: 2
> &gt; Mesg: Unable to load configuration '/var/ldap/
> &gt; ldap_client_file' ('').
> &gt; May 30 20:42:53 serv last message repeated 2 times
> &gt; ....
> &gt;
> &gt; Running prtdiag shows the following, and the &amp;quot;no
memory&amp;quot; part is
> giving me a
> &gt; heart attack. Could this just be (from the logs above), an incomplete
> boot?
> &gt; I am thinking of rebooting the machine and seeing if it will be the
same,
> or
> &gt; do you think it's something failing for sure? Many thanks in advance
for
> &gt; reading. Will summarise.
> &gt;
> &gt; &amp;gt;prtdiag -v
> &gt; System Configuration:  Sun Microsystems  sun4u Sun Fire 880
> &gt; System clock frequency: 150 MHz
> &gt; Memory size: 8192 Megabytes
> &gt;
> &gt; ========================= CPUs
> &gt; ===============================================
> &gt;
> &gt; Run    E$    CPU     CPU
> &gt; Brd  CPU  MHz    MB   Impl.    Mask
> &gt; ---  ---  ----  ----  -------  ----
> &gt; A    0    750   8.0  US-III   5.4
> &gt; B    1    750   8.0  US-III   5.4
> &gt; A    2    750   8.0  US-III   5.4
> &gt; B    3    750   8.0  US-III   5.4
> &gt;
> &gt; ========================= Memory Configuration
> &gt; ===============================
> &gt;
> &gt; Logical  Logical  Logical
> &gt; MC   Bank     Bank     Bank         DIMM    Interleave  Interleaved
> &gt; Brd  ID   num      size     Status       Size    Factor      with
> &gt; ----  ---  ----     ------   -----------  ------  ---------- 
-----------
> &gt; Cannot find any memory bank/segment info.
> &gt;
> &gt; ========================= IO Cards =========================
> &gt;
> &gt;
> &gt; Bus  Max
> &gt; IO   Port Bus       Freq Bus  Dev,
> &gt; Brd  Type  ID  Side Slot MHz  Freq Func State Name
> &gt;    Model
> &gt; ---- ---- ---- ---- ---- ---- ---- ---- -----
> &gt; --------------------------------  ----------------------
> &gt; I/O  PCI   9    A    8    33   66  1,0  ok    SUNW,m64B
> &gt;    SUNW,370-4362
> &gt;
> &gt; No failures found in System
> &gt; ===========================
> &gt;
> &gt;
> &gt; ========================= Environmental Status
=========================
> &gt;
> &gt; System Temperatures (Celsius):
> &gt; -------------------------------
> &gt; Device          Temperature     Status
> &gt; ---------------------------------------
> &gt; CPU0             68             OK
> &gt; CPU1             73             OK
> &gt; CPU2             59             OK
> &gt; CPU3             61             OK
> &gt; MB               31             OK
> &gt; IOB              26             OK
> &gt; DBP0             28             OK
> &gt;
> &gt; =================================
> &gt;
> &gt; Front Status Panel:
> &gt; -------------------
> &gt; Keyswitch position: NORMAL
> &gt;
> &gt; System LED Status:
> &gt; GEN FAULT                REMOVE
> &gt; [OFF]                    [OFF]
> &gt;
> &gt; DISK FAULT               POWER FAULT
> &gt; [OFF]                    [OFF]
> &gt;
> &gt; LEFT THERMAL FAULT       RIGHT THERMAL FAULT
> &gt; [OFF]                    [OFF]
> &gt;
> &gt; LEFT DOOR                RIGHT DOOR
> &gt; [OFF]                    [OFF]
> &gt;
> &gt; =================================
> &gt;
> &gt; Disk Status:
> &gt; Presence      Fault LED       Remove LED
> &gt; DISK   0: [PRESENT]        [OFF]           [OFF]
> &gt; DISK   1: [PRESENT]        [OFF]           [OFF]
> &gt; DISK   2: [PRESENT]        [OFF]           [OFF]
> &gt; DISK   3: [PRESENT]        [OFF]           [OFF]
> &gt; DISK   4: [PRESENT]        [OFF]           [OFF]
> &gt; DISK   5: [PRESENT]        [OFF]           [OFF]
> &gt; DISK   6: [  EMPTY]
> &gt; DISK   7: [  EMPTY]
> &gt; DISK   8: [  EMPTY]
> &gt; DISK   9: [  EMPTY]
> &gt; DISK  10: [  EMPTY]
> &gt; DISK  11: [  EMPTY]
> &gt;
> &gt; =================================
> &gt;
> &gt; Fan Bank :
> &gt; ----------
> &gt;
> &gt; Bank                        Speed         Status        Fan State
> &gt; ( RPMS )
> &gt; ----                       --------      ---------      ---------
> &gt; CPU0_PRIM_FAN                1298089537        [ENABLED]           
OK
> &gt; CPU1_PRIM_FAN                1298089537        [ENABLED]           
OK
> &gt; CPU0_SEC_FAN                    0        [DISABLED]         OK
> &gt; CPU1_SEC_FAN                    0        [DISABLED]         OK
> &gt; IO0_PRIM_FAN                 4000        [ENABLED]          OK
> &gt; IO1_PRIM_FAN                 3947        [ENABLED]          OK
> &gt; IO0_SEC_FAN                     0        [DISABLED]         OK
> &gt; IO1_SEC_FAN                     0        [DISABLED]         OK
> &gt; IO_BRIDGE_PRIM_FANfailed in picl_get_propval_by_name for fan speed
> &gt; General system failure
> &gt; Power Supplies:
> &gt; ---------------
> &gt;
> &gt; Supply     Status     Fan Fail  Temp Fail  CS Fail  3.3V   5V   12V  
48V
> &gt; ------  ------------  --------  ---------  -------  ----   --   ---  
---
> &gt; PS0      GOOD                                         9     4     3  
  5
> &gt; PS1      GOOD                                         9     3     3  
  5
> &gt; PS2      UNPLUGGED
> &gt;
> &gt;
> &gt; ========================= HW Revisions
> &gt; =======================================
> &gt;
> &gt; System PROM revisions:
> &gt; ----------------------
> &gt; OBP 4.5.6 2002/01/04 12:30
> &gt;
> &gt; IO ASIC revisions:
> &gt; ------------------
> &gt; Port
> &gt; Brd  Model            ID  Status Version
> &gt; ---- --------------- ---- ------ -------
> &gt; IB-1 unknown          8    ok     4
> &gt; IB-1 unknown          9    ok     4

Responses:
--
could be hardware, but most likely software/firmware

1) make sure you have the I/O board firmware patches installed.
111474-07 or 113312-02
2) make sure you have the picl patches installed
110849-15 (5.8)
113263-05 (5.8)
113447-17 (5.9)
108528-29 (5.8)
109873-25 (5.8)
110852-11 (5.8)
110845-03 (5.8)
110460-32 (5.8)

(others may be needed for 5.9)
--

I have come across a similar problem with V880, and cause of the problem was
that one of the free internal SCS cable got stuck in the fan and was holding
it from rotating. Just opening the side door and releasing the cable solved
the problem.

--
If you have applied the recommended and security patch bundles, make
sure you also apply the platform specific patches. Probably something
that patches libpiclfrutree or some of the other picl plugin libraries.

<wanted to, but the particular patch was withdrawn)
--
look this document of sunsolve web....

Bug Id: 4700972
Category: firmware
Subcategory: obp
State: integrated
Synopsis: Varied errors on OpenBoot Stop-A (or other error) followed by
boot

Description:
Customer encountered error messages with V880 SunOS 5.8,
V880 Highly recommended and 110460-17, 110849-09, OBP 4.5.6.

Jun 7 16:03:31 v4u-880f picld[71]: ERROR running
psvc_fan_fault_check_policy_0 on CPU0_PRIM_FAN (2500736)
Jun 7 16:03:31 v4u-880f picld[71]: I/O error
Jun 7 16:03:32 v4u-880f picld[71]: ERROR running
psvc_fan_fault_check_policy_0 on CPU1_PRIM_FAN (2500768)
Jun 7 16:03:32 v4u-880f picld[71]: I/O error
Jun 7 16:03:35 v4u-880f picld[71]: ERROR running
psvc_fan_fault_check_policy_0 on IO_BRIDGE_PRIM_FAN (2500992)
Jun 7 16:03:35 v4u-880f picld[71]: I/O error

To reproduce:
1) Power ON
2) Stop-A immidiately
3) boot at ok prompt

Work around:
1) Power ON
2) Stop-A immidiately
3) reset
4) init 0
5) boot

Integrated in releases: 4.x.build_23
Duplicate of:
Patch id:
See also:
Summary:
--



________________________________________________

Message sent using Dodo
Internet Webmail Server
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Wed Jun 16 20:17:12 2004

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:31 EST