SUMMARY: Oracle drops core on SIGSEGV,SIGBUS. Solaris or hardware failure?

From: Vitaly Beliaev (vit@mmk.ru)
Date: Mon May 04 1998 - 23:41:15 CDT


Hello sunners!

My biggest genuine thanks to the following managers who sent me points of
view to go:

Eugene Kramer <eugene@uniteq.com>
Gert Marais <maraisg@saps.org.za>
Denvers, Simon C <DenveSC@europe.stortek.com>

We installed a set of patches for Oracle 7.3.4. And looks like it solved
the problem, as error messages disappeared. In any case installing latest
patches for Solaris 2.5.1 (cluster of recommended patches) would come in
handy as well. Be sure, the system has higher version of kernel edition
(patch number 103640-X). The latest version is 19th. You may download
patches from http://sunsolve1.sun.com.

If you suspect that there are maybe some faults with memory modules, then
you should look at /var/adm/messages. ECC errors will pour enough light on
your suspection.

Yury Kuksa, the Sun CIS SE adviced the following:

All memory paths are ECC protected. Every ECC error is reported on
console. (Have a look at /var/adm/messages). To make ECC diag deeper you
can add the following entries to /etc/system

 set report_ce_log=1
 set report_ce_console=1

and reboot the machine for changes have an effect.

Well, thanks again to all brave sunners for the help!

Vitaly Beliaev

My original question follows:

:
:Greeting Managers!
:
:I'm looking for your help and hope for your great experience. Please
:anybody who might have come accross with the problem described below, share
:your opinions please! Any hint is wellcome!
:
:After six monthes of Oracle DB server 7.3.4 exploitation, our Oracle server
:suddenly started reporting several error messages. Majority of them are
:pertain to SIGSEGV, SUGBUS signals and heap error. I'm first time faced with
:that problem. At present I have two suppositions: some "hidden" failure of
:memory modules installed inside our production server Sun Ultra Enterprise 5000
:server or perhaps I need to apply freshest patches to Solaris or Oracle.
:
:Does anybody have any idea on how to solve the problem?
:
:Genuinelly,
:Vitaly Beliaev
:
:
:
:here is additional information that may be handy:
:
:server: Sun Ultra Enterprise 5000, 512Megs RAM
:OS: SunOS emperor 5.5.1 Generic_103640-08 sun4u sparc SUNW,Ultra-Enterprise
:
:
:And three Oracle error messages:
:
:Dump file /oraprog/admin/A/udump/a_ora_11045.trc
:Oracle7 Server Release 7.3.4.0.1 - Production
:With the distributed, replication and parallel query options
:PL/SQL Release 2.3.4.0.0 - Production
:ORACLE_HOME = /oraprog/app/oracle/product/7.3.4
:System name: SunOS
:Node name: emperor
:Release: 5.5.1
:Version: Generic_103640-08
:Machine: sun4u
:Instance name: A
:Redo thread mounted by this instance: 1
:Oracle process number: 22
:Unix process pid: 11045, image: oracleA
:
:*** 1998.03.28.10.35.19.000
:*** SESSION ID:(90.3372) 1998.03.28.10.35.19.000
:Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x0, PC:
:*** 1998.03.28.10.35.19.000
:ksedmp: internal or fatal error
:----- Call Stack Trace -----
:calling call entry argument values in hex
:location type point (? means dubious value)
:-------------------- -------- -------------------- ----------------------------
:Cannot find symbol in /usr/lib/libsocket.so.1.
:ksedmp()+156 CALL ksedst()+0 749B54 ? 749B40 ? 749B20 ?
: 44 ? EFFFD2F4 ? 0 ?
:ssexhd()+376 CALL ksedmp()+0 3 ? 0 ? 1 ? EFFFD80C ? 1 ?
: 7 ?
:EF5B8D18 PTR_CALL B ? 825C00 ? EFFFDBA0 ?
: 825C00 ? 0 ? 0 ?
:ksemgm()+112 CALL ksemem0()+0 B ? EFFFDE58 ? EFFFDBA0 ?
: EFFFF0F0 ? 7F5AF6 ? 4 ?
:opiodr()+3556 PTR_CALL 822C00 ? 5 ? EFFFF0E4 ?
: 825400 ? 0 ? EFFFF0F4 ?
:ttcpip()+4824 PTR_CALL 7C3EB4 ? 0 ? EFFFE0BC ?
:....
:--snipped--
:
:
:
:
:Dump file /oraprog/admin/A/udump/a_ora_14654.trc
:Oracle7 Server Release 7.3.4.0.1 - Production
:With the distributed, replication and parallel query options
:PL/SQL Release 2.3.4.0.0 - Production
:ORACLE_HOME = /oraprog/app/oracle/product/7.3.4
:System name: SunOS
:Node name: emperor
:Release: 5.5.1
:Version: Generic_103640-08
:Machine: sun4u
:Instance name: A
:Redo thread mounted by this instance: 1
:Oracle process number: 18
:Unix process pid: 14654, image: oracleA
:
:*** 1998.04.20.15.11.02.000
:*** SESSION ID:(101.242) 1998.04.20.15.11.02.000
:Exception signal: 10 (SIGBUS), code: 1 (Invalid address alignment), addr: 0x85f899, PC:
:*** 1998.04.20.15.11.02.000
:ksedmp: internal or fatal error
:ORA-07445: exception encountered: core dump [] [SIGBUS] [Invalid address alignment] [8779929] [] []
:No current SQL statement being executed.
:----- Call Stack Trace -----
:calling call entry argument values in hex
:location type point (? means dubious value)
:-------------------- -------- -------------------- ----------------------------
:Cannot find symbol in /usr/lib/libsocket.so.1.
:ksedmp()+156 CALL ksedst()+0 747AD8 ? 747AC4 ? 747AA4 ?
: 44 ? EFFFCBCC ? 0 ?
:ssexhd()+376 CALL ksedmp()+0 3 ? 0 ? 1 ? EFFFD0E4 ? 1 ?
: 6 ?
:EF5B8D18 PTR_CALL A ? 823400 ? EFFFD478 ?
:...
:--snipped--
:
:
:
:
:Dump file /oraprog/admin/A/udump/a_ora_13217.trc
:Oracle7 Server Release 7.3.4.0.1 - Production
:With the distributed, replication and parallel query options
:PL/SQL Release 2.3.4.0.0 - Production
:ORACLE_HOME = /oraprog/app/oracle/product/7.3.4
:System name: SunOS
:Node name: emperor
:Release: 5.5.1
:Version: Generic_103640-08
:Machine: sun4u
:Instance name: A
:Redo thread mounted by this instance: 1
:Oracle process number: 65
:Unix process pid: 13217, image: oracleA
:
:*** 1998.04.20.17.36.44.000
:*** SESSION ID:(96.530) 1998.04.20.17.36.44.000
:********** Internal heap ERROR 17145 addr=0x860a4c *********
:***** Dump of memory around addr 0x860a4c:
:860840 06000200 008603C0 00861C4C 00861BA0 0087F67C
:....

---
 Vitaly Beliaev
 Unix Systems Administration. JSC MISW, Magnitogorsk, Russia.
 voice: +7 3511 335639
 mailto://vit@mmk.ru
 http://www.mmk.ru
-===========================================================-



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:39 CDT