Troubleshooting with truss (HPUX) Part 1: Get Setup


Today marks the first time I've had to use truss (like strace but for HP UX) to troubleshoot a mis-behaving executable. We're trying to certify our products on IPv6 and have run into some difficulty on HP-UX 11.23 (11i v2). Our software product uses APR (Apache Portable Runtime) and works fine over IPv4. While the code exists for IPv6, it gives us the following errors when we try to use a proper IPv6 address:

  • Connect to "fe80::3143:ad80:b056:d705" failed; address family for host not supported.
  • address family for host not supported, attempting to retry.

When we append the network interface to the end of address (like some documentation suggests) we see slightly different messages:

  • Connect to "fe80::3143:ad80:b056:d705%lan0" failed; host nor service provided, or not known.
  • host nor service provided, or not known, attempting to retry.

To help root-cause the issue, we've turned to the truss utility (trace system calls and signals). This article describes a scenario where truss was used in troubleshooting. This is the first time I've used the utility and I look forward to learning more.

Next: Troubleshooting with truss (HPUX) Part 2: The answer

Notes:

 

Part 1: Install the depothelper package manager script

  • Download & install the depothelper installer
    • If you need help with HP's swinstall tool, see our article on installing bash on HP-UX for screenshots and navigation assistance
       
  • Verify that depothelper is installed to /usr/local/bin/depothelper

 

Part 2: Install the tusc package (contains truss) 

  • Run this command to install truss:
    • /usr/local/bin/depothelper tusc 

      (This installs truss along with all its dependencies)

  • Verify that truss is installed to /usr/local/bin/truss

 

Part 3: Run your executable using truss and capture a trace file

  • To capture a trace file, execute your program through truss like this:
    • /usr/local/bin/truss -f -r 3,4,5,6 -o /tmp/trace_results /path/to/exe

      (Note: you might need to change 3,4,5,6 to match your needs. YMMV) 
       
  • Explanation of the above command:
    • /usr/local/bin/truss     -     this is where the depothelper installs truss
    • -f                                -     This follows any forked / vforked processes
    • -r 3,4,5,6                    -     Shows the full contents  of  the  I/O  buffer  for  each read()  on  any  of  the specified file descriptors  (We picked 3,4,5,6 as those appear to be the file descriptors in use when we run our EXE)
    • -o /tmp/trace_results  -     This determines where the output file is stored. The results file is in plain text
    • /path/to/exe             -     This is the EXE you want to run (be sure to specify appropriate command line arguments)

Part 4: Examine the trace_results log file

Every log file will look different. Here is a sample snippet from the log file I was analyzing today. We are focusing on sockets / networking and took the first occurance to go over:

7305:    socket(AF_INET, SOCK_DGRAM, 0)                                            = 5
7305:    ioctl(5, SIOCGIFNUM, 0x7af50bbc)                                               = 0
7305:    ioctl(5, SIOCGIFCONF, 0x7af50bb0)                                             = 0
7305:    socket(AF_INET6, SOCK_DGRAM, 0)                                          = 6
7305:    ioctl(6, SIOCGLIFNUM, 0x7af50bc4)                                             = 0
7305:    ioctl(6, SIOCGLIFCONF, 0x7af50ba8)                                           = 0
7305:    ioctl(5, SIOCGIFFLAGS, 0x7af50bd0)                                           = 0
7305:    ioctl(6, SIOCGLIFFLAGS, 0x7af50bf8)                                          = 0
7305:    close(5)                                                                                     = 0
7305:    close(6)                                                                                     = 0

Note: For a crash course on UNIX Sockets, see this article (referenced above in Notes)

Here's an explanation of some of the above lines:


socket(AF_INET, SOCK_DGRAM, 0)   = 5
     socket(                   -     NIX library call to open a socket to an endpoint
     AF_INET                 -     Indicates an IPv4 address
     SOCK_DGRAM       -     Indicates a UDP connection
     0)                           -     Indicates the protocol

     = 5                         -     This is the return value from the socket() command. It is the handle identifier to the socket (File descriptor, since everything in NIX is treated like a file)
 

ioctl(5, SIOCGIFNUM, 0x7af50bbc)  = 0
     ioctl(                  -     NIX library call for device specific input/output operations
     5                        -     Integer representing an open file descriptor
     SIOCGIFNUM     -     I think this represents the number of network interfaces available (Not too many details when I google)
     0x7af50bbc        -     Represents an integer or a pointer to data sent to the device via the file descriptor context

     = 0                     -     This should represent that there were no errors resulting from the ioctl command

ioctl(5, SIOCGIFCONF, 0x7af50bb0) = 0
     ioctl(                    -     NIX library call for device specific input/output operations
     5                          -     Integer representing an open file descriptor
     SIOCGIFCONF     -     I think this returns a list of interface addresses
     0x7af50bbc          -     Represents an integer or a pointer to data sent to the device via the file descriptor context

     = 0                       -     This should represent that there were no errors resulting from the ioctl command
 

socket(AF_INET6, SOCK_DGRAM, 0) = 6
     socket(     -     NIX library call to open a socket to an endpoint
     AF_INET6     -     Indicates an IPv6 address
     SOCK_DGRAM     -      Indicates a UDP connection
     0)          -          Indicates the protocol

     = 6     -     This is the return value from the socket() command. It is the handle identifier to the socket (File descriptor, since everything in NIX is treated like a file)
 

close(5) = 0
     This closes the handle to the socket. In this case, file descriptor 5 is closed

In looking at the above log snippet, there appears to be nothing wrong. No errors and everything was closed up without problems. At this point we started looking from the end of the log file to find the last occurance of socket calls. We did find issues (bolded below):

7305:    socket(AF_INET, SOCK_DGRAM, 0)                                  = 5
7305:    ioctl(5, SIOCGIFCONF, 0x7af503cc)                                   ERR#22 EINVAL
7305:    ioctl(5, SIOCGIFCONF, 0x7af503cc)                                   = 0
7305:    ioctl(5, SIOCGIFCONF, 0x7af503cc)                                   = 0
7305:    close(5)                                                                            = 0
7305:    socket(AF_INET6, SOCK_DGRAM, 0)                                = 5
7305:    ioctl(5, SIOCGIFCONF, 0x7af503cc)                                   ERR#2 ENOENT
7305:    close(5)                                                                            = 0

Here is what we could find out about these errors:

  • ERR#22 EINVAL     -     Indicates invalid input. In this case the IPv4 socket is having issues with the IPv6 address we've entered in (this error is expected)
  • ERR#2 ENOENT     -     Indicates an error handling the the IPv6 address we've passed in. Further down we can see in the truss output that the following message was displayed:

     address family for host not supported

    For reference, here is how this output appears in the truss log:

    7305:    write(2, "         a d d r", 8)                                                              = 8
    7305:    write(2, " e s s   f a m i", 8)                                                            = 8
    7305:    write(2, " l y   f o r   h", 8)                                                               = 8
    7305:    write(2, " o s t   n o t  ", 8)                                                              = 8
    7305:    write(2, " s u p p o r t e", 8)                                                            = 8
    7305:    write(2, " d .", 2)                                                                            = 2
    7305:    write(2, "\n", 1)                                                                              = 1

Note: The last argument passed to the write command indicates the number of characters to be written. The result integer should match the input integer if this works correctly.

Note 2: If you see lines like this:

ioctl(6, TCGETA, 0x7af513b8)                                                              ERR#25 ENOTTY 

It means that there was an issue reading from the console. In our case we know that these are not the cause of our networking issue.

This process helped us narrow down the problem. Now our *NIX developer is writing a debug app that we'll use (hopefully tomorrow) to track this down further. I hope to have more to document on this soon.

Next: Troubleshooting with truss (HPUX) Part 2: The answer