SUMMARY:zs3 hangs system

From: Dixon Ly (dly@netcom.com)
Date: Thu Jan 05 1995 - 03:00:46 CST


Here's my original question:

> We are running Solaris 2.3 on a Sparc 10/40. Every frequent once
> in a while, we'd get a 'zs3: ring overflow'. This totally
> hangs, then halts the system...preventing any access to the keyboard
> and any other input devices. The man pages for 'zs' says the
> message means :
> The driver's character input ring buffer overflowed
> before it could be serviced.
>
> Great, now how do I fix it? BTW, this only happens on our two 10/40s.
> None of the Sparc 2 or 20's are experiencing this problem.
>
> I noticed that patch 101318 fixes some of the problems in the serial
> driver (zs), is this the way to go? (Sorry, can't really try it out
> first before aking because the machines are at a remote site).>
>

So far, I have received one report that 101318 fixes the problem.

sunrise!don@rambone.psi.net (see below) replies with the bug ID for
this problem.

tony.walton@UK.Sun.COM reports that the zs3 overflow problem is a side
effect of a problem with the cg6 driver, and advices the use of patch
101493. The Sparc 10's that we are using are in fact using the GX
graphics cards (cg6), so this looks good.

I've already forwarded this to our remote site manager, and we'll
cross our fingers and hope this will fix our troubles. If anyone
is interested, you can email me in a couple of weeks. We'll then
know for sure if the problem has gone away after some uptime.

The following are the replies that I received. Thanks to everyone
for replying so promptly....and I apologize for not putting this
up sooner as I was off for the Xmas - New Year.

-d

===========================================================================

>From irana@hydres.co.uk Tue Jan 3 04:00:27 1995

Hi there,

yes application of 101318-36 worked for us. You will
not experience this on all machines. So only apply the
patch to those machines experiencing zs overflows.

Irana

===========================================================================

>From vandras@sch.bme.hu Mon Jan 2 04:34:46 1995

I read your article in the info.sun-managers newsgroup. Seems like we have
the same problem with our computer! We have a Sparc 20 with Solaris 2.3.

By the way, our computer doesn't hang for good, it always comes back
after 10 minutes or so. Then it puts the 'zs3...' message into the
console window and goes on like nothing has happened.

I'd guess the bug is in the virtual memory management because we get this
annoying thing when we run two big programs at the same time and the
machine is paging all the time. We don't even have a zs3 device installed!

One interesting thing (it may not be important at all): although our
machine is surely a Sparc 20, the OpenWindows' Workstation Info reports
it's a Sparc 10.

If you get to know how to fix that zs3, please let me know! It's really
getting very annoying, and our techsupt people can't cope with it.

Thanks in advance.

A.V.

===========================================================================

>From kevin@uniq.com.au Thu Dec 29 15:47:16 1994

It means you have a *heap* of characters coming in on a line. I haven't
ever seen this message, even at 76.8K. What is attached there?

                l & h,
                kev

===========================================================================
>From tony.walton@UK.Sun.COM Fri Dec 30 03:21:20 1994

Are you running a cg6 framebuffer? If so you need patch
101493.

If you're running cg6 without this patch the system can hang
up due to a bug in the cg6 driver - the zs3 message is
merely a symptom of this (the system has hung so can't process
interrupts coming in from zs3 (the mouse).

Hope this helps

TonyW

===========================================================================
>From jle@soft03.ams.co.at Fri Dec 30 00:25:02 1994

I have the same problem you wrote in your mail, but I have it on Sparc-Station 5.
I've installed the patch 101318 already but without success. It would be kind if
you'll send me any help you received because we also try to fix this problem for
a long time.

I have the problem with hanging machines under solaris 2.3 also on SparcClassic
but without any error-message.It just hangs for some minutes and then works
again fine.
 
Thanks for any summary and greetings from Joerg

===========================================================================
From: guy@netapp.com (Guy Harris)

"zs3" is the fourth serial port on the machine; the first two are serial
ports A and B, the third is the keyboard port, and the fourth is the
mouse port.

A ring buffer overflow generally means that characters are coming into
the serial port at a very high speed, and I wouldn't expect the mouse to
do that; perhaps either your mouse, or your keyboard (the mouse connects
to the keyboard) is broken?

If possible, try swapping the keyboard and mouse on one of your 10/40's
with the keyboard and mouse on one of the other machines; if the problem
travels with the keyboard and mouse, get the keyboard and mouse
replaced.

If the problem *doesn't* travel with the keyboard and mouse, there might
be some software problem with the serial driver (the stuff from the
mouse *does* go through the serial driver before it goes through the
mouse STREAMS module upstream of it), or there might be some hardware
problem with the 10/40's themselves.

===========================================================================
>From sunrise!don@rambone.psi.net Thu Dec 29 16:29:43 1994

The is a recent posting of a bug id centered around your
problem. See the attached article:

#########################################################################
Bug Id: 1172410
 Category: graphics
 Subcategory: cg6
 Release summary: s1093, sol2.3_edII
 Synopsis: Customer runs his application and gets zs3: ring buffer overflow
         Integrated in releases:
 Patch id:
 Description:

Customer has noticed that when doing an enlargement in their pcb layout pgrm,
the system will hang for several minutes. During that tine, no activity from the
system can be observed, not even via the network. After a while, the system will
come back. Since some of these systems are nfs servers for other systems, this
problem has a cascading effect on other systems.

Customer has also noticed this problem in OW, not just their application (Mentor
Graphics). At about the same time that the problem occurs, the following error
is printed to the console:

zs3: ring buffer overflow

Customer has the following patches installed:

101316-01
101317-09
101318-45
101327-06
101329-15
101331-03
101344-07
101347-01
101498-02

I have isolated the area in source code from which the error is being generated:

        if (za->za_sw_overrun > 10) {
#ifdef ZSA_DEBUG
                mutex_enter(zs->zs_excl_hi);
                zsa_h_log_add('@');
                mutex_exit(zs->zs_excl_hi);
#endif
                cmn_err(CE_NOTE, "zs%d:ring buffer overflow\n",
                    UNIT(za->za_dev));
                za->za_sw_overrun = 0;
        }
 
This is line 2016 of zs_async.c. I should also point out that the customer is
running Solaris 2.3 II and I only have access to Solaris 2.3.
 History:
         Submitter: deguc Date: 07/18/94
         Dispatch Operator: bugtraq Date: 07/20/94
         Evaluator: mbag Date: 07/20/94
         Closeout Operator: bharat@eng Date: 09/27/94

###############################################################################

At this point in time there is no apparent fix for the problem. You
might try submitting an email inquiry to the above listed Closeout
Operator and see if you can gain any new info.

Try sending to: bharat@eng.sun.com

and see if you get a reply.

Best of Luck!
-Don



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:13 CDT