Summary : SCSI Bus Transition

From: Pablo Jejcic <pablo.jejcic_at_gmail.com>
Date: Wed Dec 12 2007 - 10:10:05 EST
Hi Guys,
Replies below. Basically, cables were ok, and it was a combination of two
problems:

1) As Brad said, the controller... one of the 3 controllers on the machine
was broken, and after checking everything we manage to find the issues
there, isolate it, and voila! no more errors.

2) condensation on the back of the server... really odd, but it looks like
at the back of the server there was some tiny little drops of water, we
checked everything and nothing was leaking, nor dripping liquid... then we
saw that the server is installed on a communcation rack with a solid door at
the front (air-flow on the V240 is from the front to the back). We
dismounted the front door, and magic! no more condensation...

It seems to be working now... but this is one I won't forget easily...

Cheers and thanks to everyone!

Pablo.-.

*Forwarded Conversation*
Subject: *SCSI Bus Transition*
------------------------

* From: Pablo Jejcic* <pablo.jejcic@gmail.com> To:
sunmanagers@sunmanagers.org
Date: 2 December 2007 20:07

Hi Gurus,
I need some confirmation... I'm troubleshooting a remote server:

SunFire V240 + D2 JB.

We have 2 RAID 5 configured on the server.

Everything was working fine until we moved the box to the server room into a
controlled environment... now 2-3 times a week, we get the following set of
errors:

WARNING: /pci@1d,700000/pci@1/scsi@4 (qus0):
        SCSI Bus Transition
WARNING: /pci@1d,700000/pci@1/scsi@4 (qus0):
        Received unexpected SCSI Reset

Then a few hours after we get the warnings, we loose the disks, the RAIDs,
everything on the external array....

My guesses here:
1- Problem with the termination of the SCSI chain - the D2 have automatic
terminators, but I'm guessing some problem with them can be causing this.
2- Problems with the SCSI cables, some of the pins, or something is wrong,
and they might be a bit loose, with the vibration from the storoage array
the pop out, and we start getting the issues.
3- SCSI controller issues... but I don't understand how this could be the
cause as the errors should be more frequent, or they should be there all the
time.
4- a couple fo the disks on one of the pictures I got of the array look
that they don't have the "cover" on (the tray to slot them into the
array)... so they might be moving... but why all the other ones go off?

The server is in a very humid environment, but we moved it into the data
centre because we thought that the A/C will help to reduce the problem...
but it just made it worse.

Any comments, ideas, suggestions are very welcome

Thanks a lot in advance,

Pablo.-.



--------
* From: Jeff Marble* <jrmarble@gmail.com> Reply-To: JRMarble@gmail.com
To: Pablo Jejcic <pablo.jejcic@gmail.com>
Date: 3 December 2007 02:19

Another common problem with the copper cables is bent pins.  Check
each cable end carefully for any of the small pins that might not be
straight.

Jeff
[Quoted text hidden]
> _______________________________________________
> sunmanagers mailing list
> sunmanagers@sunmanagers.org
> http://www.sunmanagers.org/mailman/listinfo/sunmanagers
>



--
Jeff Marble
JRMarble@GMail.com

--------
* From: Sajan* <sajhnair@yahoo.co.in> To: Pablo Jejcic <
pablo.jejcic@gmail.com>
Date: 3 December 2007 07:17

Hi

 Same thing happened to me also 2 months before. Ours too in a humid environ
ment.What I did was i just opened the whole Server and reseated every thing
including HDDs, RAMs and even the scsi connectors, Try, this will surely
solve the poblem. Dont do it in a hurry. take ur own time and do this .

Regards



*Pablo Jejcic <pablo.jejcic@gmail.com>* wrote:

[Quoted text hidden]
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


------------------------------
Bring your gang together - do your thing. Start your
group.<http://in.rd.yahoo.com/tagline_groups_2/*http://in.promos.yahoo.com/gr
oups>


--------
* From: Pablo Jejcic* <pablo.jejcic@gmail.com> To: Sajan <
sajhnair@yahoo.co.in>
Date: 3 December 2007 08:35

Thanks SAjan,
But the server is in Angola, and I'm in Scotland at the moment :S

Cheers!
Pablo.-[Quoted text hidden]
--
--
#########################################
Pablo Jejcic
"He sospechado alguna vez que la znica cosa sin misterio es la felicidad,
porque se justifica por sm sola."
Jorge Luis Borges, escritor argentino (1899-1986)
Blog with me at http://hachetheboss.blogspot.com
#########################################

--------
* From: Pablo Jejcic* <pablo.jejcic@gmail.com> To: JRMarble@gmail.com
Date: 3 December 2007 08:36

Thanks Jeff, will check and see... the only problem, is that the server is
in Angola, and I'm in Aberdeen... :S

Pablo.-[Quoted text hidden]
[Quoted text hidden]

--------
* From: Christopher Barnard* <cbarnar1@earthlink.net> To: Pablo Jejcic <
pablo.jejcic@gmail.com>
Date: 3 December 2007 14:10

Moving the server to an A/C room was most definitely the right thing.
A/C both cools and dehumidifies.

I do not believe there is any issue with not having the cover on the
case.  It is purely decorational.

I would suspect the cables first, since they are easily replaced.  Do
you have someone who can support the machines in this remote site?  I
would have him or her replace all of the SCSI cables.  If after a
couple of weeks the problem recurs, then its time to suspect the (much
harder to fix) internal bus termination.

Christopher L. Barnard                           cbarnar1@earthlink.net
-----------------------------------------------------------------------
 When I was a boy, I was told that anyone could be president.  Now I am
 beginning to believe it.                            -- Clarence Darrow
[Quoted text hidden]
> _______________________________________________
[Quoted text hidden]

--------
* From: Pablo Jejcic* <pablo.jejcic@gmail.com> To: Christopher Barnard <
cbarnar1@earthlink.net>
Date: 4 December 2007 23:32

Thanks!
I'm getting the cables replaced this week, if that fails, I will connect the
whole array to only one dual-controller, and if that fails, then I will put
the server on fire and ask for a new one ;)

Thanks!!!

Pablo.-[Quoted text hidden]
[Quoted text hidden]

--------
* From: Brad Morrison* <brad.morrison@gmail.com> To: Pablo Jejcic <
pablo.jejcic@gmail.com>
Date: 5 December 2007 18:26

It's almost certainly the controller, if you've checked all of the other
physical elements you listed.

Bad SCSI controllers are 100% unpredictable. Unfortunately, they're on the
mainboard for a v240.

On Dec 2, 2007 2:07 PM, Pablo Jejcic <pablo.jejcic@gmail.com> wrote:

> [Quoted text hidden]
> _______________________________________________
> sunmanagers mailing list
> sunmanagers@sunmanagers.org
> http://www.sunmanagers.org/mailman/listinfo/sunmanagers
>


--------
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Wed Dec 12 10:10:23 2007

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:07 EST