Why were literal formatted strings (f-strings) so slow in Python 3.6 alpha? (now fixed in 3.6 stable)

后端 未结 2 1937
野性不改
野性不改 2020-12-24 11:47

I\'ve downloaded a Python 3.6 alpha build from the Python Github repository, and one of my favourite new features is literal string formatting. It can be used like so:

2条回答
  •  隐瞒了意图╮
    2020-12-24 12:05

    Before 3.6 beta 1, the format string f'x is {x}' was compiled to the equivalent of ''.join(['x is ', x.__format__('')]). The resulting bytecode was inefficient for several reasons:

    1. it built a sequence of string fragments...
    2. ... and this sequence was a list, not a tuple! (it is slightly faster to construct tuples than lists).
    3. it pushed an empty string onto the stack
    4. it looked up the join method on the empty string
    5. it invoked __format__ on even bare Unicode objects, for which the __format__('') would always return self, or integer objects, for which __format__('') as the argument returned str(self).
    6. __format__ method isn't slotted.

    However, for a more complex and longer string, the literal formatted strings would still have been faster than the corresponding '...'.format(...) call, because for the latter the string is interpreted every time the string is formatted.


    This very question was the prime motivator for issue 27078 asking for a new Python bytecode opcode for string fragments into a string (the opcode gets one operand - the number of fragments on the stack; the fragments are pushed onto the stack in the order of appearance i.e. the last part is the topmost item). Serhiy Storchaka implemented this new opcode and merged it into CPython so that it has been available in Python 3.6 ever since beta 1 version (and thus in Python 3.6.0 final).

    As the result the literal formatted strings will be much faster than string.format. They are also often much faster than the old-style formatting in Python 3.6, if you're just interpolating str or int objects:

    >>> timeit.timeit("x = 2; 'X is {}'.format(x)")
    0.32464265200542286
    >>> timeit.timeit("x = 2; 'X is %s' % x")
    0.2260766440012958
    >>> timeit.timeit("x = 2; f'X is {x}'")
    0.14437875000294298
    

    f'X is {x}' now compiles to

    >>> dis.dis("f'X is {x}'")
      1           0 LOAD_CONST               0 ('X is ')
                  2 LOAD_NAME                0 (x)
                  4 FORMAT_VALUE             0
                  6 BUILD_STRING             2
                  8 RETURN_VALUE
    

    The new BUILD_STRING, along with an optimization in FORMAT_VALUE code completely eliminates first 5 of the 6 sources of inefficiency. The __format__ method still isn't slotted, so it requires a dictionary lookup on the class and thus calling it is necessarily slower than calling __str__, but a call can now be completely avoided in the common cases of formatting int or str instances (not subclasses!) without formatting specifiers.

提交回复
热议问题