Locating file descriptor leak in OS X application

问题

Background

I have some very complex application. It is composition of couple libraries. Now QA team found the some problem (something reports an error).
Fromm logs I can see that application is leaking a file descriptors (+1000 after 7 hours of automated tests). QA team has delivered rapport "opened files and ports" from "Activity monitor" and I know exactly to which server connection is not closed.

From full application logs I can see that leak is quite systematic (there is no sudden burst), but I was unable to reproduce issue to see even a small leak of file descriptors.

Problem

Even thou I'm sure for which server connection is never closed, I'm unable to find code responsible. I'm unable reproduce issue.
In logs I can see that all resources my library maintains are properly freed, still server address suggest this is my responsibility or NSURLSession (which is invalidated).

Since there are other libraries and application code it self there is small chance that leak is caused by third party code.

Question

How to locate code responsible for leaking file descriptor? Best candidate is use dtruss which looks very promising. From documentation I can see it can print stack backtraces -s when system API is used.
Problem is that I do not know how to use this in such way that I will not get flooded with information. I need only information who created opened file descriptor and if it was closed destroyed. Since I can't reproduce issue I need a script which could be run by QA team so the could deliver me an output.

If there are other ways to find the source of file descriptor leak please let me know.

There is bunch of predefined scripts which are using dtruss, but I don't see anything what is matching my needs.

Final notes

What is strange the only code I'm aware is using problematic connection, do not use file descriptors directly, but uses custom NSURLSession (configured as: one connection per host, minimum TLS 1.0, disable cookies, custom certificate validation). From logs I can see NSURLSession is invalidated properly. I doubt NSURLSession is source of leak, but currently this is the only candidate.

回答1:

OK, I found out how to do it - on Solaris 11, anyway. I get this output (and yes, I needed root on Solaris 11):

bash-4.1# dtrace -s fdleaks.d -c ./fdLeaker
open( './fdLeaker' ) returned 3
open( './fdLeaker' ) returned 4
open( './fdLeaker' ) returned 5
falloc fp: ffffa1003ae56590, fd: 3, saved fd: 3
falloc fp: ffffa10139d28f58, fd: 4, saved fd: 4
falloc fp: ffffa10030a86df0, fd: 5, saved fd: 5

opened file: ./fdLeaker
leaked fd: 3


              libc.so.1`__systemcall+0x6
              libc.so.1`__open+0x29
              libc.so.1`open+0x84
              fdLeaker`main+0x2b
              fdLeaker`_start+0x72

opened file: ./fdLeaker
leaked fd: 4


              libc.so.1`__systemcall+0x6
              libc.so.1`__open+0x29
              libc.so.1`open+0x84
              fdLeaker`main+0x64
              fdLeaker`_start+0x72

The fdleaks.d dTrace script that finds leaked file descriptors:

#!/usr/sbin/dtrace

/* this will probably need tuning
   note there can be significant performance
   impacts if you make these large */
#pragma D option nspec=4
#pragma D option specsize=128k

#pragma D option quiet

syscall::open*:entry
/ pid == $target /
{
    /* arg1 might not have a physical mapping yet so
       we can't call copyinstr() until open() returns
       and we don't have a file descriptor yet -
       we won't get that until open() returns anyway */
    self->path = arg1;
}

/* arg0 is the file descriptor being returned */
syscall::open*:return
/ pid == $target && arg0 >= 0  && self->path /
{
    /* get a speculation ID tied to this
       file descriptor and start speculative
       tracing */
    openspec[ arg0 ] = speculation();
    speculate( openspec[ arg0 ] );

    /* this output won't appear unless the associated
       speculation id is commited */
    printf( "\nopened file: %s\n", copyinstr( self->path ) );
    printf( "leaked fd: %d\n\n", arg0 );
    ustack();

    /* free the saved path */
    self->path = 0;
}

syscall::close:entry
/ pid == $target && arg0 >= 0 /
{
    /* closing the fd, so discard the speculation
       and free the id by setting it to zero */
    discard( openspec[ arg0 ] );
    openspec[ arg0 ] = 0;
}

/* Solaris uses falloc() to open a file and associate
   the fd with an internal file_t structure

    When the kernel closes file descriptors that the
    process left open, it uses the closeall() function
    which walks the internal structures then calls
    closef() using the file_t *, so there's no way
    to get the original process file descritor in
    closeall() or closef() dTrace probes.

    falloc() is called on open() to associate the
    file_t * with a file descriptor, so this
    saves the pointers passed to falloc()
    that are used to return the file_t * and
    file descriptor once they're filled in
    when falloc() returns */
fbt::falloc:entry
/ pid == $target /
{
   self->fpp = args[ 2 ];
   self->fdp = args[ 3 ];
}


/* Clause-local variables to make casting clearer */
this int fd;
this uint64_t fp;

/* array to associate a file descriptor with its file_t *
   structure in the kernel */
int fdArray[ uint64_t fp ];

fbt::falloc:return
/ pid == $target && self->fpp && self->fdp /
{
    /* get the fd and file_t * values being
       returned to the caller */
    this->fd = ( * ( int * ) self->fdp );
    this->fp = ( * ( uint64_t * ) self->fpp );

    /* associate the fd with its file_t * */
    fdArray[ this->fp ] = ( int ) this->fd;

    /* verification output */
    printf( "falloc fp: %x, fd: %d, saved fd: %d\n", this->fp, this->fd, fdArray[ this->fp ] );
}

/* if this gets called and the dereferenced
   openspec array element is a still-valid
   speculation id, the fd associated with
   the file_t * passed to closef() was never
   closed by the process itself */
fbt::closef:entry
/ pid == $target /
{
    /* commit the speculative tracing since
       this file descriptor was leaked */
    commit( openspec[ fdArray[ arg0 ] ] );
}

First, I wrote this little C program to leak fds:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#include <stdio.h>

#include <unistd.h>

int main( int argc, char **argv )
{
    int ii;

    for ( ii = 0; ii < argc; ii++ )
    {
        int fd = open( argv[ ii ], O_RDONLY );
        fprintf( stderr, "open( '%s' ) returned %d\n", argv[ ii ], fd );
        fd = open( argv[ ii ], O_RDONLY );
        fprintf( stderr, "open( '%s' ) returned %d\n", argv[ ii ], fd );
        fd = open( argv[ ii ], O_RDONLY );
        fprintf( stderr, "open( '%s' ) returned %d\n", argv[ ii ], fd );
        close( fd );
    }
    return( 0 );
}

Then I ran it under this dTrace script to figure out what the kernel does to close orphaned file descriptors, dtrace -s exit.d -c ./fdLeaker:

#!/usr/sbin/dtrace -s

#pragma D option quiet

syscall::rexit:entry
{
    self->exit = 1;
}

syscall::rexit:return
/ self->exit /
{
    self->exit = 0;
}

fbt:::entry
/ self->exit /
{
    printf( "---> %s\n", probefunc );
}

fbt:::return
/ self->exit /
{
    printf( "<--- %s\n", probefunc );
}

That produced a lot of output, and I noticed closeall() and closef() functions, examined the source code, and wrote the dTrace script.

Note also that the process exit dTrace probe on Solaris 11 is the rexit one - that probably changes on OSX.

The biggest problem on Solaris is getting the file descriptor for the file in the kernel code that closes orphaned file descriptors. Solaris doesn't close by file descriptor, it closes by struct file_t pointers in the kernel open files structures for the process. So I had to examine the Solaris source to figure out where the fd is associated with the file_t * - and that's in the falloc() function. The dTrace script associates a file_t * with its fd in an associative array.

None of that is likely to work on OSX.

If you're lucky, the OSX kernel will close orphaned file descriptors by the file descriptor itself, or at least provide something that tells you the fd is being closed, perhaps an auditing function.

来源：https://stackoverflow.com/questions/47159408/locating-file-descriptor-leak-in-os-x-application

标签

macos

nsurlsession

file-descriptor

dtrace

dtruss