Which is the preferred way to concatenate a string in Python?

前端 未结 12 837
眼角桃花
眼角桃花 2020-11-22 06:45

Since Python\'s string can\'t be changed, I was wondering how to concatenate a string more efficiently?

I can write like it:

s += string         


        
相关标签:
12条回答
  • 2020-11-22 07:13

    In Python >= 3.6, the new f-string is an efficient way to concatenate a string.

    >>> name = 'some_name'
    >>> number = 123
    >>>
    >>> f'Name is {name} and the number is {number}.'
    'Name is some_name and the number is 123.'
    
    0 讨论(0)
  • 2020-11-22 07:16

    If the strings you are concatenating are literals, use String literal concatenation

    re.compile(
            "[A-Za-z_]"       # letter or underscore
            "[A-Za-z0-9_]*"   # letter, digit or underscore
        )
    

    This is useful if you want to comment on part of a string (as above) or if you want to use raw strings or triple quotes for part of a literal but not all.

    Since this happens at the syntax layer it uses zero concatenation operators.

    0 讨论(0)
  • 2020-11-22 07:16

    As @jdi mentions Python documentation suggests to use str.join or io.StringIO for string concatenation. And says that a developer should expect quadratic time from += in a loop, even though there's an optimisation since Python 2.4. As this answer says:

    If Python detects that the left argument has no other references, it calls realloc to attempt to avoid a copy by resizing the string in place. This is not something you should ever rely on, because it's an implementation detail and because if realloc ends up needing to move the string frequently, performance degrades to O(n^2) anyway.

    I will show an example of real-world code that naively relied on += this optimisation, but it didn't apply. The code below converts an iterable of short strings into bigger chunks to be used in a bulk API.

    def test_concat_chunk(seq, split_by):
        result = ['']
        for item in seq:
            if len(result[-1]) + len(item) > split_by: 
                result.append('')
            result[-1] += item
        return result
    

    This code can literary run for hours because of quadratic time complexity. Below are alternatives with suggested data structures:

    import io
    
    def test_stringio_chunk(seq, split_by):
        def chunk():
            buf = io.StringIO()
            size = 0
            for item in seq:
                if size + len(item) <= split_by:
                    size += buf.write(item)
                else:
                    yield buf.getvalue()
                    buf = io.StringIO()
                    size = buf.write(item)
            if size:
                yield buf.getvalue()
    
        return list(chunk())
    
    def test_join_chunk(seq, split_by):
        def chunk():
            buf = []
            size = 0
            for item in seq:
                if size + len(item) <= split_by:
                    buf.append(item)
                    size += len(item)
                else:
                    yield ''.join(buf)                
                    buf.clear()
                    buf.append(item)
                    size = len(item)
            if size:
                yield ''.join(buf)
    
        return list(chunk())
    

    And a micro-benchmark:

    import timeit
    import random
    import string
    import matplotlib.pyplot as plt
    
    line = ''.join(random.choices(
        string.ascii_uppercase + string.digits, k=512)) + '\n'
    x = []
    y_concat = []
    y_stringio = []
    y_join = []
    n = 5
    for i in range(1, 11):
        x.append(i)
        seq = [line] * (20 * 2 ** 20 // len(line))
        chunk_size = i * 2 ** 20
        y_concat.append(
            timeit.timeit(lambda: test_concat_chunk(seq, chunk_size), number=n) / n)
        y_stringio.append(
            timeit.timeit(lambda: test_stringio_chunk(seq, chunk_size), number=n) / n)
        y_join.append(
            timeit.timeit(lambda: test_join_chunk(seq, chunk_size), number=n) / n)
    plt.plot(x, y_concat)
    plt.plot(x, y_stringio)
    plt.plot(x, y_join)
    plt.legend(['concat', 'stringio', 'join'], loc='upper left')
    plt.show()
    

    0 讨论(0)
  • 2020-11-22 07:18

    You can use this(more efficient) too. (https://softwareengineering.stackexchange.com/questions/304445/why-is-s-better-than-for-concatenation)

    s += "%s" %(stringfromelsewhere)
    
    0 讨论(0)
  • 2020-11-22 07:19

    If you are concatenating a lot of values, then neither. Appending a list is expensive. You can use StringIO for that. Especially if you are building it up over a lot of operations.

    from cStringIO import StringIO
    # python3:  from io import StringIO
    
    buf = StringIO()
    
    buf.write('foo')
    buf.write('foo')
    buf.write('foo')
    
    buf.getvalue()
    # 'foofoofoo'
    

    If you already have a complete list returned to you from some other operation, then just use the ''.join(aList)

    From the python FAQ: What is the most efficient way to concatenate many strings together?

    str and bytes objects are immutable, therefore concatenating many strings together is inefficient as each concatenation creates a new object. In the general case, the total runtime cost is quadratic in the total string length.

    To accumulate many str objects, the recommended idiom is to place them into a list and call str.join() at the end:

    chunks = []
    for s in my_strings:
        chunks.append(s)
    result = ''.join(chunks)
    

    (another reasonably efficient idiom is to use io.StringIO)

    To accumulate many bytes objects, the recommended idiom is to extend a bytearray object using in-place concatenation (the += operator):

    result = bytearray()
    for b in my_bytes_objects:
        result += b
    

    Edit: I was silly and had the results pasted backwards, making it look like appending to a list was faster than cStringIO. I have also added tests for bytearray/str concat, as well as a second round of tests using a larger list with larger strings. (python 2.7.3)

    ipython test example for large lists of strings

    try:
        from cStringIO import StringIO
    except:
        from io import StringIO
    
    source = ['foo']*1000
    
    %%timeit buf = StringIO()
    for i in source:
        buf.write(i)
    final = buf.getvalue()
    # 1000 loops, best of 3: 1.27 ms per loop
    
    %%timeit out = []
    for i in source:
        out.append(i)
    final = ''.join(out)
    # 1000 loops, best of 3: 9.89 ms per loop
    
    %%timeit out = bytearray()
    for i in source:
        out += i
    # 10000 loops, best of 3: 98.5 µs per loop
    
    %%timeit out = ""
    for i in source:
        out += i
    # 10000 loops, best of 3: 161 µs per loop
    
    ## Repeat the tests with a larger list, containing
    ## strings that are bigger than the small string caching 
    ## done by the Python
    source = ['foo']*1000
    
    # cStringIO
    # 10 loops, best of 3: 19.2 ms per loop
    
    # list append and join
    # 100 loops, best of 3: 144 ms per loop
    
    # bytearray() +=
    # 100 loops, best of 3: 3.8 ms per loop
    
    # str() +=
    # 100 loops, best of 3: 5.11 ms per loop
    
    0 讨论(0)
  • 2020-11-22 07:21

    my use case was slight different. I had to construct a query where more then 20 fields were dynamic. I followed this approach of using format method

    query = "insert into {0}({1},{2},{3}) values({4}, {5}, {6})"
    query.format('users','name','age','dna','suzan',1010,'nda')
    

    this was comparatively simpler for me instead of using + or other ways

    0 讨论(0)
提交回复
热议问题