Any reason not to use '+' to concatenate two strings?

问题

A common antipattern in Python is to concatenate a sequence of strings using + in a loop. This is bad because the Python interpreter has to create a new string object for each iteration, and it ends up taking quadratic time. (Recent versions of CPython can apparently optimize this in some cases, but other implementations can\'t, so programmers are discouraged from relying on this.) \'\'.join is the right way to do this.

However, I\'ve heard it said (including here on Stack Overflow) that you should never, ever use + for string concatenation, but instead always use \'\'.join or a format string. I don\'t understand why this is the case if you\'re only concatenating two strings. If my understanding is correct, it shouldn\'t take quadratic time, and I think a + b is cleaner and more readable than either \'\'.join((a, b)) or \'%s%s\' % (a, b).

Is it good practice to use + to concatenate two strings? Or is there a problem I\'m not aware of?

回答1:

There is nothing wrong in concatenating two strings with +. Indeed it's easier to read than ''.join([a, b]).

You are right though that concatenating more than 2 strings with + is an O(n^2) operation (compared to O(n) for join) and thus becomes inefficient. However this has not to do with using a loop. Even a + b + c + ... is O(n^2), the reason being that each concatenation produces a new string.

CPython2.4 and above try to mitigate that, but it's still advisable to use join when concatenating more than 2 strings.

回答2:

Plus operator is perfectly fine solution to concatenate two Python strings. But if you keep adding more than two strings (n > 25) , you might want to think something else.

''.join([a, b, c]) trick is a performance optimization.

回答3:

The assumption that one should never, ever use + for string concatenation, but instead always use ''.join may be a myth. It is true that using + creates unnecessary temporary copies of immutable string object but the other not oft quoted fact is that calling join in a loop would generally add the overhead of function call. Lets take your example.

Create two lists, one from the linked SO question and another a bigger fabricated

>>> myl1 = ['A','B','C','D','E','F']
>>> myl2=[chr(random.randint(65,90)) for i in range(0,10000)]

Lets create two functions, UseJoin and UsePlus to use the respective join and + functionality.

>>> def UsePlus():
    return [myl[i] + myl[i + 1] for i in range(0,len(myl), 2)]

>>> def UseJoin():
    [''.join((myl[i],myl[i + 1])) for i in range(0,len(myl), 2)]

Lets run timeit with the first list

>>> myl=myl1
>>> t1=timeit.Timer("UsePlus()","from __main__ import UsePlus")
>>> t2=timeit.Timer("UseJoin()","from __main__ import UseJoin")
>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=100000)/100000)
2.48 usec/pass
>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=100000)/100000)
2.61 usec/pass
>>>

They have almost the same runtime.

Lets use cProfile

>>> myl=myl2
>>> cProfile.run("UsePlus()")
         5 function calls in 0.001 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.001    0.001 <pyshell#1376>:1(UsePlus)
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {range}


>>> cProfile.run("UseJoin()")
         5005 function calls in 0.029 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.015    0.015    0.029    0.029 <pyshell#1388>:1(UseJoin)
        1    0.000    0.000    0.029    0.029 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     5000    0.014    0.000    0.014    0.000 {method 'join' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {range}

And it looks that using Join, results in unnecessary function calls which could add to the overhead.

Now coming back to the question. Should one discourage the use of + over join in all cases?

I believe no, things should be taken into consideration

Length of the String in Question
No of Concatenation Operation.

And off-course in a development pre-mature optimization is evil.

回答4:

When working with multiple people, it's sometimes difficult to know exactly what's happening. Using a format string instead of concatenation can avoid one particular annoyance that's happened a whole ton of times to us:

Say, a function requires an argument, and you write it expecting to get a string:

In [1]: def foo(zeta):
   ...:     print 'bar: ' + zeta

In [2]: foo('bang')
bar: bang

So, this function may be used pretty often throughout the code. Your coworkers may know exactly what it does, but not necessarily be fully up-to-speed on the internals, and may not know that the function expects a string. And so they may end up with this:

In [3]: foo(23)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/izkata/<ipython console> in <module>()

/home/izkata/<ipython console> in foo(zeta)

TypeError: cannot concatenate 'str' and 'int' objects

There would be no problem if you just used a format string:

In [1]: def foo(zeta):
   ...:     print 'bar: %s' % zeta
   ...:     
   ...:     

In [2]: foo('bang')
bar: bang

In [3]: foo(23)
bar: 23

The same is true for all types of objects that define __str__, which may be passed in as well:

In [1]: from datetime import date

In [2]: zeta = date(2012, 4, 15)

In [3]: print 'bar: ' + zeta
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/izkata/<ipython console> in <module>()

TypeError: cannot concatenate 'str' and 'datetime.date' objects

In [4]: print 'bar: %s' % zeta
bar: 2012-04-15

So yes: If you can use a format string do it and take advantage of what Python has to offer.

回答5:

I have done a quick test:

import sys

str = e = "a xxxxxxxxxx very xxxxxxxxxx long xxxxxxxxxx string xxxxxxxxxx\n"

for i in range(int(sys.argv[1])):
    str = str + e

and timed it:

mslade@mickpc:/binks/micks/ruby/tests$ time python /binks/micks/junk/strings.py  8000000
8000000 times

real    0m2.165s
user    0m1.620s
sys     0m0.540s
mslade@mickpc:/binks/micks/ruby/tests$ time python /binks/micks/junk/strings.py  16000000
16000000 times

real    0m4.360s
user    0m3.480s
sys     0m0.870s

There is apparently an optimisation for the a = a + b case. It does not exhibit O(n^2) time as one might suspect.

So at least in terms of performance, using + is fine.

回答6:

According to Python docs, using str.join() will give you performance consistence across various implementations of Python. Although CPython optimizes away the quadratic behavior of s = s + t, other Python implementations may not.

CPython implementation detail: If s and t are both strings, some Python implementations such as CPython can usually perform an in-place optimization for assignments of the form s = s + t or s += t. When applicable, this optimization makes quadratic run-time much less likely. This optimization is both version and implementation dependent. For performance sensitive code, it is preferable to use the str.join() method which assures consistent linear concatenation performance across versions and implementations.

Sequence Types in Python docs (see the foot note [6])

回答7:

''.join([a, b]) is better solution than +.

Because Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such)

form a += b or a = a + b is fragile even in CPython and isn't present at all in implementations that don't use refcounting (reference counting is a technique of storing the number of references, pointers, or handles to a resource such as an object, block of memory, disk space or other resource)

https://www.python.org/dev/peps/pep-0008/#programming-recommendations

来源：https://stackoverflow.com/questions/10043636/any-reason-not-to-use-to-concatenate-two-strings

标签

python

string-concatenation

anti-patterns

Any reason not to use &#39;+&#39; to concatenate two strings?

问题