How to find global static initializations

前端 未结 2 1559
梦谈多话
梦谈多话 2021-01-03 08:00

I just read this excellent article: http://neugierig.org/software/chromium/notes/2011/08/static-initializers.html and then I tried: https://gcc.gnu.org/onlinedocs/gccint/Ini

相关标签:
2条回答
  • 2021-01-03 08:03

    As you already observed, the implementation details of contructors/initialization functions are highly compiler (version) dependent. While I am not aware of a tool for this, what current GCC/clang versions do is simple enough to let a small script do the job: .init_array is just a list of entry points. objdump -s can be used to load the list, and nm to lookup the symbol names. Here's a Python script that does that. It should work for any binary that was generated by the said compilers:

    #!/usr/bin/env python
    import os
    import sys
    
    # Load .init_array section
    objdump_output = os.popen("objdump -s '%s' -j .init_array" % (sys.argv[1].replace("'", r"\'"),)).read()
    is_64bit = "x86-64" in objdump_output
    init_array = objdump_output[objdump_output.find("Contents of section .init_array:") + 33:]
    initializers = []
    for line in init_array.split("\n"):
        parts = line.split()
        if not parts:
            continue
        parts.pop(0)  # Remove offset
        parts.pop(-1) # Remove ascii representation
    
        if is_64bit:
            # 64bit pointers are 8 bytes long
            parts = [ "".join(parts[i:i+2]) for i in range(0, len(parts), 2) ]
    
        # Fix endianess
        parts = [ "".join(reversed([ x[i:i+2] for i in range(0, len(x), 2) ])) for x in parts ]
    
        initializers += parts
    
    # Load disassembly for c++ constructors
    dis_output = os.popen("objdump -d '%s' | c++filt" % (sys.argv[1].replace("'", r"\'"), )).read()
    def find_associated_constructor(disassembly, symbol):
        # Find associated __static_initialization function
        loc = disassembly.find("<%s>" % symbol)
        if loc < 0:
            return False
        loc = disassembly.find(" <", loc)
        if loc < 0:
            return False
        symbol = disassembly[loc+2:disassembly.find("\n", loc)][:-1]
        if symbol[:23] != "__static_initialization":
            return False
        address = disassembly[disassembly.rfind(" ", 0, loc)+1:loc]
        loc = disassembly.find("%s <%s>" % (address, symbol))
        if loc < 0:
            return False
        # Find all callq's in that function
        end_of_function = disassembly.find("\n\n", loc)
        symbols = []
        while loc < end_of_function:
            loc = disassembly.find("callq", loc)
            if loc < 0 or loc > end_of_function:
                break
            loc = disassembly.find("<", loc)
            symbols.append(disassembly[loc+1:disassembly.find("\n", loc)][:-1])
        return symbols
    
    # Load symbol names, if available
    nm_output = os.popen("nm '%s'" % (sys.argv[1].replace("'", r"\'"), )).read()
    nm_symbols = {}
    for line in nm_output.split("\n"):
        parts = line.split()
        if not parts:
            continue
        nm_symbols[parts[0]] = parts[-1]
    
    # Output a list of initializers
    print("Initializers:")
    for initializer in initializers:
        symbol = nm_symbols[initializer] if initializer in nm_symbols else "???"
        constructor = find_associated_constructor(dis_output, symbol)
        if constructor:
            for function in constructor:
                print("%s %s -> %s" % (initializer, symbol, function))
        else:
            print("%s %s" % (initializer, symbol))
    

    C++ static initializers are not called directly, but through two generated functions, _GLOBAL__sub_I_.. and __static_initialization... The script uses the disassembly of those functions to get the name of the actual constructor. You'll need the c++filt tool to unmangle the names, or remove the call from the script to see the raw symbol name.

    Shared libraries can have their own initializer lists, which would not be displayed by this script. The situation is slightly more complicated there: For non-static initializers, the .init_array gets an all-zero entry that is overwritten with the final address of the initializer when loading the library. So this script would output an address with all zeros.

    0 讨论(0)
  • 2021-01-03 08:25

    There are multiple things executed when loading an ELF object, not just .init_array. To get an overview, I suggest looking at the sources of libc's loader, especially _dl_init() and call_init().

    0 讨论(0)
提交回复
热议问题