Python string formatting: is '%' more efficient than 'format' function?

后端 未结 1 1553
一个人的身影
一个人的身影 2020-12-04 23:33

I wanted to compare different to build a string in Python from different variables:

  • using + to concatenate (referred to as \'plus\')
  • using
相关标签:
1条回答
  • 2020-12-05 00:17
    1. Yes, % string formatting is faster than the .format method
    2. most likely (this may have a much better explanation) due to % being a syntactical notation (hence fast execution), whereas .format involves at least one extra method call
    3. because attribute value access also involves an extra method call, viz. __getattr__

    I ran a slightly better analysis (on Python 3.8.2) using timeit of various formatting methods, results of which are as follows (pretty-printed with BeautifulTable) -

    +-----------------+-------+-------+-------+-------+-------+--------+
    | Type \ num_vars |   1   |   2   |   5   |  10   |  50   |  250   |
    +-----------------+-------+-------+-------+-------+-------+--------+
    |    f_str_str    | 0.056 | 0.063 | 0.115 | 0.173 | 0.754 | 3.717  |
    +-----------------+-------+-------+-------+-------+-------+--------+
    |    f_str_int    | 0.055 | 0.148 | 0.354 | 0.656 | 3.186 | 15.747 |
    +-----------------+-------+-------+-------+-------+-------+--------+
    |   concat_str    | 0.012 | 0.044 | 0.169 | 0.333 | 1.888 | 10.231 |
    +-----------------+-------+-------+-------+-------+-------+--------+
    |    pct_s_str    | 0.091 | 0.114 | 0.182 | 0.313 | 1.213 | 6.019  |
    +-----------------+-------+-------+-------+-------+-------+--------+
    |    pct_s_int    | 0.09  | 0.141 | 0.248 | 0.479 | 2.179 | 10.768 |
    +-----------------+-------+-------+-------+-------+-------+--------+
    | dot_format_str  | 0.143 | 0.157 | 0.251 | 0.461 | 1.745 | 8.259  |
    +-----------------+-------+-------+-------+-------+-------+--------+
    | dot_format_int  | 0.141 | 0.192 | 0.333 | 0.62  | 2.735 | 13.298 |
    +-----------------+-------+-------+-------+-------+-------+--------+
    | dot_format2_str | 0.159 | 0.195 | 0.33  | 0.634 | 3.494 | 18.975 |
    +-----------------+-------+-------+-------+-------+-------+--------+
    | dot_format2_int | 0.158 | 0.227 | 0.422 | 0.762 | 4.337 | 25.498 |
    +-----------------+-------+-------+-------+-------+-------+--------+
    

    The trailing _str & _int represent the operation was carried out on respective value types.

    Kindly note that the concat_str result for a single variable is essentially just the string itself, so it shouldn't really be considered.

    My setup for arriving at the results -

    from timeit import timeit
    from beautifultable import BeautifulTable  # pip install beautifultable
    
    times = {}
    
    for num_vars in (250, 50, 10, 5, 2, 1):
        f_str = "f'{" + '}{'.join([f'x{i}' for i in range(num_vars)]) + "}'"
        # "f'{x0}{x1}'"
        concat = '+'.join([f'x{i}' for i in range(num_vars)])
        # 'x0+x1'
        pct_s = '"' + '%s'*num_vars + '" % (' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'
        # '"%s%s" % (x0,x1)'
        dot_format = '"' + '{}'*num_vars + '".format(' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'
        # '"{}{}".format(x0,x1)'
        dot_format2 = '"{' + '}{'.join([f'{i}' for i in range(num_vars)]) + '}".format(' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'
        # '"{0}{1}".format(x0,x1)'
    
        vars = ','.join([f'x{i}' for i in range(num_vars)])
        vals_str = tuple(map(str, range(num_vars))) if num_vars > 1 else '0'
        setup_str = f'{vars} = {vals_str}'
        # "x0,x1 = ('0', '1')"
        vals_int = tuple(range(num_vars)) if num_vars > 1 else 0
        setup_int = f'{vars} = {vals_int}'
        # 'x0,x1 = (0, 1)'
    
        times[num_vars] = {
            'f_str_str': timeit(f_str, setup_str),
            'f_str_int': timeit(f_str, setup_int),
            'concat_str': timeit(concat, setup_str),
            # 'concat_int': timeit(concat, setup_int), # this will be summation, not concat
            'pct_s_str': timeit(pct_s, setup_str),
            'pct_s_int': timeit(pct_s, setup_int),
            'dot_format_str': timeit(dot_format, setup_str),
            'dot_format_int': timeit(dot_format, setup_int),
            'dot_format2_str': timeit(dot_format2, setup_str),
            'dot_format2_int': timeit(dot_format2, setup_int),
        }
    
    table = BeautifulTable()
    table.column_headers = ['Type \ num_vars'] + list(map(str, times.keys()))
    # Order is preserved, so I didn't worry much
    for key in ('f_str_str', 'f_str_int', 'concat_str', 'pct_s_str', 'pct_s_int', 'dot_format_str', 'dot_format_int', 'dot_format2_str', 'dot_format2_int'):
        table.append_row([key] + [times[num_vars][key] for num_vars in (1, 2, 5, 10, 50, 250)])
    print(table)
    

    I couldn't go beyond num_vars=250 because of the max arguments (255) limit with timeit.

    tl;dr - Python string formatting performance : f-strings are fastest and more elegant, but at times (due to some implementation restrictions & being Py3.6+ only), you might have to use other formatting options as necessary.

    0 讨论(0)
提交回复
热议问题