Summary: A5x00 Failure; cannot bring loop A online

From: Wesley W. Garland <wes_at_page.ca>
Date: Wed Jan 14 2004 - 14:12:11 EST
Hi, Sun Managers!

The problem has been solved. It turns out that the replacement IB (Interface Board) was either defective, or the wrong revision. Here's
where it gets interesting; it turns out it was the A5200 I was having problems with, not the A5000. Whoops. :)  But the time I'd gotten to the data center, it was no longer flickering the lightnight bolt, but reporting a failed IB.

The replacement IB, which the vendor said would work with either the A5000 or the A5200 (but invoiced as "A5000 IB") is stamped with Sun Part Number 340-4069-04, and stickered "-06 REV 52.) To fix it, I used an IB from my lab A5200, marked with the same part number but stickered "-07 REV 50". Ironically enough, the one from the lab is date coded 98/51 while the replacement is date coded 99/47.

I also found that one of the IBM GBICs connected to the hub for that channel (going to an HBA) had failed. I wonder if the flickering lightning-bolt-state is hard on the equipment, or if there are Gremlins in the system?

I received some excellent advise when trying to fix this problem:

Octave Orgeron: 
 - Double-check firmware revisions in HBA, A5000, IB. 
 - Double-check GBIC with a loopback cable. 
 - Patch matrix for A5x00, HBAs, etc. here: http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=finfodoc%2F43212&zone_110=43212

Scott Mickey: 
 - Try the A5200 IB from your lab. (Good advice!)
 - Note that while the Sun part number for A5000 and A5200 IB's are the
   same, I think the revision levels are different, so IB's from
   A5000's should not be deployed in A5200's (I think we have a winner!)
 - Did you know that many datacenters replace their fibre once a year?
   (No, I didn't, I think mine will.. we only rolled out FCAL in Sep/03)
 - A5000 Configuration Guide: http://docs-pdf.sun.com/805-0264-15/805-0264-15.pdf
 - Sun X6732A hub is actually a Vixel 1000 (they even say "Vixel" on the bottom) The Vixel manual is here: http://www.sms.com/support/Vixel/Rapport%201000/InstallGuide_00041017-001_D.pdf
 - You should power up the Vixel hub before the rest of the equipment (I didn't know that, but I had been doing it that way "by luck" -- as the hubs have no power switches)
 - Check your logs for messages (wow, it filled up /var/adm..):
   Jan 11 09:51:18 zaphod scsi: [ID 243001 kern.info] /pci@1f,4000/SUNW,ifp@2 (ifp0):
   Jan 11 09:51:18 zaphod  Loop reconfigure in progress
   Jan 11 09:51:18 zaphod scsi: [ID 243001 kern.info]   /pci@1f,4000/SUNW,ifp@2 (ifp0):
   Jan 11 09:51:18 zaphod  LIP reset occured; cause f801
   Jan 11 09:51:18 zaphod scsi: [ID 243001 kern.info] /pci@1f,4000/SUNW,ifp@2 (ifp0):
   Jan 11 09:51:18 zaphod  Loop reconfigure done
   Jan 11 09:51:18 zaphod scsi: [ID 243001 kern.info] /pci@1f,4000/SUNW,ifp@2 (ifp0):
   Jan 11 09:51:18 zaphod  LIP occured; cause f801
   Jan 11 09:51:18 zaphod scsi: [ID 243001 kern.info] /pci@1f,4000/SUNW,ifp@2 (ifp0):

Also, I learned one more tidbit from the A5000 troubleshooting PDF; you're supposed to use the GBICs in a particular order in the Vixel hubs to prevent signal degredation. I didn't change any of my running hubs (which are using ports 1, 5, and 6) but I clustered the hub connected to the broken IB such that it was using ports 3, 4, and 5, just in case.

Thanks a million, guys!

Wes

--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102
--
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Wed Jan 14 14:12:08 2004

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:29 EST