modifying python bytecode

后端 未结 1 578
一整个雨季
一整个雨季 2020-12-10 05:00

I was wondering how to modify byte code, then recompile that code so I can use it in python as a function? I\'ve been trying:

a = \"\"\"
def fact():
    a =         


        
相关标签:
1条回答
  • 2020-12-10 05:21

    Update: For sundry reasons I have started writing a Cross-Python-version assembler. See https://github.com/rocky/python-xasm It is still in very early beta.

    As far as I know there is no other currently-maintained Python assembler. PEAK's Bytecode Disassembler was developed for Python 2.6, and later modified to support early Python 2.7.

    It is pretty cool from the documentation. But it relies on other PEAK libraries which might be problematic.

    I'll go through the whole example to give you a feel for what you'd have to do. It is not pretty, but then you should expect that.

    Basically after modifying the bytecode, you need to create a new types.CodeType object. You need a new one because many of the objects in the code type, for good reason, you can't change. For example the interpreter may have some of these object values cached.

    After creating code, you can use this in functions that use a code type which can be used in exec or eval.

    Or you can write this to a bytecode file. Alas the code format has changed between Python versions 1.3, 1,5, 2.0, 3.0, and 3.8. And by the way so has the optimization and bytecodes. In fact, in Python 3.6 they will be word codes not bytecodes.

    So here is what you'd have to do for your example:

    a = """
    def fact():
        a = 8
        a = 0
        return a
    """
    c = compile(a, '<string>', 'exec')
    fn_code = c.co_consts[0] # Pick up the function code from the main code
    from dis import dis
    dis(fn_code)
    print("=" * 30)
    
    x = fn_code.co_code[6:16] # modify bytecode
    
    import types
    opt_fn_code = types.CodeType(fn_code.co_argcount,
                                 # c.co_kwonlyargcount,  Add this in Python3
                                 # c.co_posonlyargcount, Add this in Python 3.8+
                                 fn_code.co_nlocals,
                                 fn_code.co_stacksize,
                                 fn_code.co_flags,
                                 x,  # fn_code.co_code: this you changed
                                 fn_code.co_consts,
                                 fn_code.co_names,
                                 fn_code.co_varnames,
                                 fn_code.co_filename,
                                 fn_code.co_name,
                                 fn_code.co_firstlineno,
                                 fn_code.co_lnotab,   # In general, You should adjust this
                                 fn_code.co_freevars,
                                 fn_code.co_cellvars)
    dis(opt_fn_code)
    print("=" * 30)
    print("Result is", eval(opt_fn_code))
    
    # Now let's change the value of what's returned
    co_consts = list(opt_fn_code.co_consts)
    co_consts[-1] = 10
    
    opt_fn_code = types.CodeType(fn_code.co_argcount,
                                 # c.co_kwonlyargcount,  Add this in Python3
                                 # c.co_posonlyargcount, Add this in Python 3.8+
                                 fn_code.co_nlocals,
                                 fn_code.co_stacksize,
                                 fn_code.co_flags,
                                 x,  # fn_code.co_code: this you changed
                                 tuple(co_consts), # this is now changed too
                                 fn_code.co_names,
                                 fn_code.co_varnames,
                                 fn_code.co_filename,
                                 fn_code.co_name,
                                 fn_code.co_firstlineno,
                                 fn_code.co_lnotab,   # In general, You should adjust this
                                 fn_code.co_freevars,
                                 fn_code.co_cellvars)
    
    dis(opt_fn_code)
    print("=" * 30)
    print("Result is now", eval(opt_fn_code))
    

    When I ran this here is what I got:

      3           0 LOAD_CONST               1 (8)
                  3 STORE_FAST               0 (a)
    
      4           6 LOAD_CONST               2 (0)
                  9 STORE_FAST               0 (a)
    
      5          12 LOAD_FAST                0 (a)
                 15 RETURN_VALUE
    ==============================
      3           0 LOAD_CONST               2 (0)
                  3 STORE_FAST               0 (a)
    
      4           6 LOAD_FAST                0 (a)
                  9 RETURN_VALUE
    ==============================
    ('Result is', 0)
      3           0 LOAD_CONST               2 (10)
                  3 STORE_FAST               0 (a)
    
      4           6 LOAD_FAST                0 (a)
                  9 RETURN_VALUE
    ==============================
    ('Result is now', 10)
    

    Notice that the line numbers haven't changed even though I removed in code a couple of lines. That is because I didn't update fn_code.co_lnotab.

    If you want to now write a Python bytecode file from this. Here is what you'd do:

    co_consts = list(c.co_consts)
    co_consts[0] = opt_fn_code
    c1 = types.CodeType(c.co_argcount,
                        # c.co_posonlyargcount, Add this in Python 3.8+
                        # c.co_kwonlyargcount,  Add this in Python3
                        c.co_nlocals,
                        c.co_stacksize,
                        c.co_flags,
                        c.co_code,
                        tuple(co_consts),
                        c.co_names,
                        c.co_varnames,
                        c.co_filename,
                        c.co_name,
                        c.co_firstlineno,
                        c.co_lnotab,   # In general, You should adjust this
                        c.co_freevars,
                        c.co_cellvars)
    
    from struct import pack
    with open('/tmp/testing.pyc', 'w') as fp:
            fp.write(pack('Hcc', 62211, '\r', '\n')) # Python 2.7 magic number
            import time
            fp.write(pack('I', int(time.time())))
            # In Python 3.7+ you need to PEP 552 bits 
            # In Python 3 you need to write out the size mod 2**32 here
            import marshal
            fp.write(marshal.dumps(c1))
    

    To simplify writing the boilerplate bytecode above, I've added a routine to xasm called write_pycfile().

    Now to check the results:

    $ uncompyle6 /tmp/testing.pyc
    # uncompyle6 version 2.9.2
    # Python bytecode 2.7 (62211)
    # Disassembled from: Python 2.7.12 (default, Jul 26 2016, 22:53:31)
    # [GCC 5.4.0 20160609]
    # Embedded file name: <string>
    # Compiled at: 2016-10-18 05:52:13
    
    
    def fact():
        a = 0
    # okay decompiling /tmp/testing.pyc
    $ pydisasm /tmp/testing.pyc
    # pydisasm version 3.1.0
    # Python bytecode 2.7 (62211) disassembled from Python 2.7
    # Timestamp in code: 2016-10-18 05:52:13
    # Method Name:       <module>
    # Filename:          <string>
    # Argument count:    0
    # Number of locals:  0
    # Stack size:        1
    # Flags:             0x00000040 (NOFREE)
    # Constants:
    #    0: <code object fact at 0x7f815843e4b0, file "<string>", line 2>
    #    1: None
    # Names:
    #    0: fact
      2           0 LOAD_CONST               0 (<code object fact at 0x7f815843e4b0, file "<string>", line 2>)
                  3 MAKE_FUNCTION            0
                  6 STORE_NAME               0 (fact)
                  9 LOAD_CONST               1 (None)
                 12 RETURN_VALUE
    
    
    # Method Name:       fact
    # Filename:          <string>
    # Argument count:    0
    # Number of locals:  1
    # Stack size:        1
    # Flags:             0x00000043 (NOFREE | NEWLOCALS | OPTIMIZED)
    # Constants:
    #    0: None
    #    1: 8
    #    2: 10
    # Local variables:
    #    0: a
      3           0 LOAD_CONST               2 (10)
                  3 STORE_FAST               0 (a)
    
      4           6 LOAD_CONST               0 (None)
                  9 RETURN_VALUE
    $
    

    An alternate approach for optimization is to optimize at the Abstract Syntax Tree level (AST). The compile, eval and exec functions can start from an AST, or you can dump the AST. You could also write this back out as Python source using the Python module astor

    Note however that some kinds of optimization like tail-recursion elimination might leave bytecode in a form that it can't be transformed in a truly faithful way to source code. See my pycon2018 Columbia Lightning Talk for a video I made which elminates tail recursion in bytecode to get an idea of what I'm talking about here.

    0 讨论(0)
提交回复
热议问题