SUMMARY: Sun E-450 Hangup problem

From: Rizwan Sadiq <rizwan_sadiq_at_hotmail.com>
Date: Sun Nov 28 2004 - 08:44:39 EST
Dear all,

The problem was finally fixed. In a desparate attempt to fix the fault at 
the earliest, I took some parallel steps as given below:
1. Installation of latest patch cluster (including glm patch)
2. Changed the Lan card hme0 with qfe0.
3. I also found some errors in arp cache, where the mac address of 
problematic machine had different mac address in arp cache of other servers. 
I cleared arp cache of all machines.

After these three steps, i found that the hangup problem was gone. Since 
then the server is working fine without any problem.

Thanks to all who responded:

Aaron Daniel Vega Villa
Check your firmware level at disk and system board, may be thera has been 
memory or cpu errors that your OBP is not handling properly!
try running at run level 2 or run level 1 so you can determine if there is a 
service /program / specific process affecting the whole system!

hope this helps..



Cian O'Sullivan
Sounds like it could be an IP address conflict.

Ghassan Qanzu'a
it seems that your system is hacked, could you run the following two 
commands
on your server
# ps -ef  | wc -l
# /usr/ucb/ps aux | wc -l
does both commands give the same number??   if not the definitly your system 
is
hacked.

Ed Guenther
Well in hindsight you should have built a new box from
scratch and not touched this one.  Then swap the new
box with this one.  That way if there were problems,
switching back would be no trouble.

I would say that your problem could be incomplete
network connections, i.e. ping of death and the like.
You need to work with your networking people and
determine what connections are getting to the box.
The connections could be at such a low level that your
box may not even note them in netstat output.





My original post:
Dear Admins,
I am managing a Sun E-450 server, running solaris 8 with two processors and 
512 MB RAM. Since yesterday evening, I am facing a strange problem. The 
server hangs suddenly. If we isolate the server from network by pulling out 
lan cable then it does not hang. But when the server is on the network it 
hangs in just 15 min. Surprisingly the load average, swap utilization, io 
wait state, top etc show normal values just seconds before it hangs.

I am running apache and qmail on it with effective RBL spam blocking. There 
are no signs of any intrusion. We are using PIX firewall for security. I 
have a standby server. I just changed the IP of that server and replaced the 
problematic one. The stand by server shows same behaviour.
The log files,syslog and messages, do not show any error messages except the 
following
SCSI: Warning: pci@1f,4000/scsi@3 (glm0) or
occassionaly this error : SCSI bus reset

I get this error only with the actual server. However, the standby server 
does not give any error, it just hangs without any error message. I had 
applied the sun recommended patch cluster on oct 2003. Now I am downloading 
the latest patch cluster.

This server has been running without any problem since last 3 years or so 
and recently there has been no change or upgradation done.
I wonder y this error is appearing. Can any one guide me about this problem. 
We are an ISP and can not affoard such hangups as this machine is working as 
RADIUS/Mail and web server. The load average of the machine is less than 2.0 
(max.) and typically it is at 0.5.

Please help me in solving this issue please.




Regards,

Rizwan H. Sadiq

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Sun Nov 28 08:45:29 2004

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:40 EST