Hi managers!
Original question:
>
> I'm getting a lot of the following syslog messages lately:
>
> esp0: Target 1 now Synchronous at 4.0 mb/s max transmit rate
>
> They occur in bursts up to several dozen times a day. syslogd will
> typically append something like "last message repeated 32 times" to
> the entries in /var/adm/messages. The machine is working fine.
>
> The machine is a SS2 with 2 disks (sd0 internal 200MB, sd1 external 1GB),
> no other SCSI devices.
>
> Any clues, someone? (I'll summarize to the list).
>
I forgot to mention my OS release (4.1.2). Sorry about that.
I received several replies (see below), ranging from "install patch # xxxx"
to "your disk could be going bad". The solution is probably to install
SunOS 4.1.3 as some people suggested, but I won't have time for that in the
next few days, so it is not verified yet. SCSI bus termination probably does
not have anything to do with it, the problem shows up regardless of whether
I have a terminator on the external drive (I tried this without rebooting).
The cable length could be the culprit (1.5m). I'll try a shorter one, but
I cannot reboot until (approx.) friday, so that has to wait until then.
John DiMarco mentioned a "scsiinfo" package, but I have not been able to
retrieve it from ftp.cdf.toronto.edu, that host seems to be down (ftp
timeouts on connect, ping receives no answer).
Some people were interested in any information about the problem. It seems
that the problem occurres at other sites too and that Sun has trouble giving
accurate advice.
Thanks to the following people:
pjy@mso.anu.edu.au (Peter Young)
Roland Hamblin EOS <rolandh@eos.co.uk>
Birger.Wathne@vest.sdata.no (Birger A. Wathne)
barnes@sde.mdso.vf.ge.com (Barnes William)
heiser@tdwr.ed.ray.com (Bill Heiser)
Christian Lawrence <cal@soac.bellcore.com>
Geert Jan de Groot <geertj@ica.philips.nl>
louis@andataco.com
bobh@ide.com
nick@dsd.es.com (Nick Nickerson)
glenn@uniq.com.au (Glenn Satchell - Uniq Professional Services)
John DiMarco <jdd@db.toronto.edu>
john@oncology.uthscsa.edu (John Justin Hough)
drtr@mrao.cam.ac.uk (David Robinson)
-- Sten
Here are the replies:
***** From: pjy@mso.anu.edu.au (Peter Young) **********************************
Hi Sten,
You need to apply patch 100484-01.
Regards
***** From: Roland Hamblin EOS <rolandh@eos.co.uk> ****************************
We, too have had the same problem, with the same configuration. It doesn't
cause any system problems, but is just annoying.
As far as I can remember from what Sun said, there is no solution. I think the
reason for them is that the SCSI bus encounters a wierd state and
re-synchronises itself.
Roland
***** From: Birger.Wathne@vest.sdata.no (Birger A. Wathne) ********************
Either damages cabling, or your external disk (target 1) is dying. I
hope you make backups.....
It could be that the servo mechanism is getting 'off track' if the disk
has been transported, or is monted sideways. If this is the case, you
may be able to fix the disk by formatting. But it can also be the
on-board SCSI-controller or servo mechanics failing.....
Birger
***** From: barnes@sde.mdso.vf.ge.com (Barnes William) ************************
We had that problem with a server that was setup as follows:
Sparc 1+ with 200 Meg internal, 1 Sun boot box with 1/4" Qic24 and 669 Disk,
1 Sun boot box with 2 669 disks. When we asked Sun about the message they
said to take off the external terminator. No explanation just remove it. We
did and the messages went away. These boot boxes do not have any termination
internally, but that fixed our problem.
Bill Barnes
***** From: heiser@tdwr.ed.ray.com (Bill Heiser) ******************************
This is a <=4.1.2 bug. If you upgrade to 4.1.3, it will go away.
I think there is also a patch for 4.1.2, but I didn't use it - I upgraded
to 4.1.3 instead.
***** From: Christian Lawrence <cal@soac.bellcore.com> ************************
are you running 4.1.2 ? then you need 100484-01.
***** From: Geert Jan de Groot <geertj@ica.philips.nl> ************************
Looks like the classic SCSI problem: too long cables, bad termination,...
Are you shure the external disk is terminated only ONCE?
Geert Jan
***** From: louis@andataco.com ************************************************
Want some guesses? Those messages mean that the drive at target 1 is
resetting. The sd driver is apparently sufficiently robust to "drive"
through such an occurence.
The usual suspects are cables and termination. Check for tight
connnections everywhere.
Less likely is that your sd1 is going bad. Backups current?
***** From: bobh@ide.com ******************************************************
Hi,
You're not alone. We have a 4/690 that is doing the same thing. The difference
is, I have an exabyte tape drive on it which I upgraded to an 8500 model.
Before the upgrade I did not notice these messages. Only since.
I look forward to your summary.
Thanks,
Bob Hudson
***** From: nick@dsd.es.com (Nick Nickerson) **********************************
I have a SS2 clone system that exhibits the exact same behavior. I
would be very interested in any information you receive on this one.
Thanks
***** From: glenn@uniq.com.au (Glenn Satchell - Uniq Professional Services) ***
Hi Sten,
Check your scsi cables and termination. Sounds like you've got a bad
connection somewhere. What is happening is that the scsi bus is being
reset and the disks are having to renegotiate withg the scsi controller
on the sun.
regards
***** From: John DiMarco <jdd@db.toronto.edu> *********************************
It may be a noisy or overlong cable or bad termination.
The esp host adapter dynamically renegotiates the synchronous transfer rate
with a device when it detects noise (eg. frequent parity errors), sometimes
even falling back to asynchronous mode. Before 4.1.3, SunOS would report
one of the above messages whenever the transfer rate was renegotiated. Those
are the messages you are seeing.
If you want to confirm that this is indeed what is happening, grab "scsiinfo"
from ftp.cdf.toronto.edu in pub/scsiinfo. This'll grub around in the
kernel and tell you the current synch transfer rate of all the scsi devices
known to your kernel, and which ones have been tagged as "noisy". If scsiinfo
reports that target 1 is noisy, or if it shows a frequently changing
transfer rate, you've got this problem.
John
***** From: john@oncology.uthscsa.edu (John Justin Hough) *********************
Sten,
Please post any response to this you get. I had an IPX that did
this. It doesn't do it now, but I didn't change anything on it and
would really like to know what it was
***** From: drtr@mrao.cam.ac.uk (David Robinson) *****************************
You didn't say what version of SunOs you are running, is it 4.1.2?
In 4.1.1 (I think) the sun would not dynamically change the transmit rate,
hence you only saw these messages at boot up.
In 4.1.2 (maybe) they added the feature that that the sun will update the
transmit speed, depending on how fast it finds the devices accepting data.
It also prints a message telling you the rate was changed.
In 4.1.3, I think they kept the feature, but removed the messages (even on
bootup) because they kept on being asked 'what does it mean?'
That is from memory. There is probably a patch to disable these messages.
David Robinson (drtr@mrao.cam.ac.uk)
--------------------------------------------------------------------------
Sten Gunterberg, ERGON Informatik AG, Zuerich, Switzerland
gunterberg@ergon.ch (internet) /S=Gunterberg/O=Ergon/A=EUnet/C=CH/ (X.400)
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:07:53 CDT