# Interim Summary : Solaris TRUSS and pfiles Problem Solving Methodology for Failed Processess under Solaris < 10

From: Steven Sim <steven.sim_at_faplccc.net>
Date: Wed Dec 07 2005 - 09:52:05 EST
Hello All;

THANKS TO ALL who have replied! Truly this is GREAT GREAT list!

I have learnt a lot and this summary represents my small effort in
giving something back.

I would like to thank Brett Lymn, Sebastien Daubigne, Lucien Hercau and

First Question (Truss interpretation) (Reply By Sebastien Daubigne)
-------------------------------------------------------------------
Original truss output;

15771/3:         6.4546 getpid()
= 15771 [1]
15771/3:         6.4547 door_call(14, 0xFDC00B78)                       = 0
15771/3:         6.4548 close(14)                                       = 0
Err#11 EAGAIN*
15771/1:         6.4551 lwp_sema_post(0xFDC01E60)                       = 0
15771/3:         6.4557 lwp_sema_wait(0xFDC01E60)                       = 0
15771/1:         6.4558 lwp_mutex_wakeup(0xFF0F3500)                    = 0
15771/3:         6.4558 lwp_mutex_lock(0xFF0F3500)                      = 0
*15771/3:         6.4559 write(13, " 0\f020102 e07\n01  04\0".., 14)
= 14*

My original assertion (INCORRECT!) is given below;

"What caught my eye immediately was the Err#11 EAGAIN returned from the

I then went to /usr/include/sys and did a

\$ grep EAGAIN *
errno.h:#define EAGAIN  11      /* Resource temporarily unavailable     */
errno.h:#define EWOULDBLOCK     EAGAIN

So now I know what EAGAIN meant, now I needed to now what resource was "temporarily unavailable"

I don't think the socket is "temporarily unavailable", I think rather that it is a non-blocking read() on the socket. As there is no data to read on the socket, it returns EAGAIN.

When attempting to read a file associated with a socket or a
stream  that  is not a pipe, a FIFO, or a terminal,  and the
file has no data currently available:

o  If O_NDELAY or O_NONBLOCK is set,  read()  returns  -1
and sets errno to EAGAIN.

Let's see fd 13 flags (below), it has O_NONBLOCK flag set :

13: S_IFSOCK mode:0666 dev:305,0 ino:51163 uid:0 gid:0 size:0

>>       O_RDWR|O_NONBLOCK
>>         sockname: AF_INET 127.0.0.1  port: 389
>>         peername: AF_INET 127.0.0.1  port: 45568
>
This is a IP socket (inet), local port 389, remote port 45568

2nd Question. (pfiles output interpretation) (Reply By Sebastien Daubigne)
-------------------------------------------------------------------------

10: S_IFSOCK mode:0666 dev:305,0 ino:57121 uid:0 gid:0 size:0
O_RDWR|O_NONBLOCK
sockname: AF_INET 127.0.0.1  port: 389
peername: AF_INET 127.0.0.1  port: 45556
11: S_IFSOCK mode:0666 dev:305,0 ino:30118 uid:0 gid:0 size:0
O_RDWR|O_NONBLOCK
sockname: AF_INET 127.0.0.1  port: 389
peername: AF_INET 127.0.0.1  port: 45544
12: S_IFSOCK mode:0666 dev:305,0 ino:36574 uid:0 gid:0 size:0
O_RDWR|O_NONBLOCK
sockname: AF_INET 127.0.0.1  port: 389
peername: AF_INET 127.0.0.1  port: 45546
13: S_IFSOCK mode:0666 dev:305,0 ino:51163 uid:0 gid:0 size:0
O_RDWR|O_NONBLOCK
sockname: AF_INET 127.0.0.1  port: 389
peername: AF_INET 127.0.0.1  port: 45568

>>
>> How do I properly interpret the pfiles output listed above?
>>
>> fd 13 is a S_IFSOCK. Can the gurus on this list advise me how I can link
>> the above fd (dev 305,0 ino 51163) to an actual socket or file on the
>> system?
>
>dev X,Y means special file with major X and minor Y
>
>To find the device associated with the file, you can use the "find" command
>:
>
>find /dev /devices -type b -o -type c| xargs ls -ld | grep 'X, *Y'
>
>Once you have the device, you can find the filesystem (if any) with "df",
>then search the inode number with another find command :
>
>find /filesystem -inum <inode_number>
>
>Note that "lsof" tool will do this job for you.
>
>
Another pfiles explanation by Adrian Saul;
-----------------------------------

13: S_IFSOCK mode:0666 dev:305,0 ino:51163 uid:0 gid:0 size:0
>       O_RDWR|O_NONBLOCK
>         sockname: AF_INET 127.0.0.1  port: 389
>         peername: AF_INET 127.0.0.1  port: 45568
>
> How do I properly interpret the pfiles output listed above?
>
> fd 13 is a S_IFSOCK. Can the gurus on this list advise me how I can link
> the above fd (dev 305,0 ino 51163) to an actual socket or file on the
> system?

It lists it right below. It is a TCP socket between ports 389 and 45568
on the localhost - I would hazard a guess that it is a LDAP connection.

What you are seeing is an attempt to read from the socket where no data
is present to be read.   If this is a custom application, you should
probably use poll() to see if there is data waiting and then issue the read

3rd Question. write function return code interpretation (Reply By Sebastien Daubigne)
--------------------------------------------------------------------------------------

>>
>> In the above example, the write function also returned a status > 0
>> (14).
>>
>> What is the proper methodology for finding out what this write
>> function
>> returned code 14 signify?
>
man -s 2 write : On success, write(2) syscall will return the number of
bytes written. So you have 14 bytes written. On error, it returns -1

>>
>> One last question, the return code is in Hex right? Not Decimal?
>
Truss prints return codes in decimal mode

Warmest Regards

Steven Sim

Fujitsu Asia Pte. Ltd.
_____________________________________________________

This e-mail is confidential and may also be privileged. If you are not the intended recipient, please notify us immediately. You should not copy or use it for any purpose, nor disclose its contents to any other person.

Opinions, conclusions and other information in this message that do not relate to the official business of my firm shall be understood as neither given nor endorsed by it.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Wed Dec 7 09:55:21 2005

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:53 EST