问题
As I understand it "".join(iterable_of_strings) is the preferred way to concatenate strings because it allows for optimizations that avoid having to rewrite the immutable object to memory more times than necessary.
Adding strings inside of an expression is reliably running faster than joining them for moderately large number of operations for me.
I get about 2.9-3.2 seconds of time on joined and 2.3-2.7 on added running this code with Python 3.3 on my laptop. I couldn't find a good answer Googling this. Could someone explain what might be going on or direct me to a good resource?
import uuid
import time
class mock:
def __init__(self):
self.name = "foo"
self.address = "address"
self.age = "age"
self.primarykey = uuid.uuid4()
data_list = [mock() for x in range(2000000)]
def added():
my_dict_list = {}
t = time.time()
new_dict = { item.primarykey: item.name + item.address + item.age for item in data_list }
print(time.time() - t)
def joined():
my_dict_list = {}
t = time.time()
new_dict = { item.primarykey: ''.join([item.name, item.address, item.age]) for item in data_list }
print(time.time() - t)
joined()
added()
回答1:
The time difference you're seeing comes from creating the list to be passed to join
. And while you can get a small speedup from using a tuple instead, it's still going to be slower than just concatenating with +
when there are only a few short strings.
It would be different if you had an iterable of strings to start with, rather than an object with strings as attributes. Then you could call join
directly on the iterable, rather than needing to build a new one for each call.
Here's some testing I did with the timeit
module:
import timeit
short_strings = ["foo", "bar", "baz"]
long_strings = [s*1000 for s in short_strings]
def concat(a, b, c):
return a + b + c
def concat_from_list(lst):
return lst[0] + lst[1] + lst[2]
def join(a, b, c):
return "".join([a, b, c])
def join_tuple(a, b, c):
return "".join((a, b, c))
def join_from_list(lst):
return "".join(lst)
def test():
print("Short strings")
print("{:20}{}".format("concat:",
timeit.timeit(lambda: concat(*short_strings))))
print("{:20}{}".format("concat_from_list:",
timeit.timeit(lambda: concat_from_list(short_strings))))
print("{:20}{}".format("join:",
timeit.timeit(lambda: join(*short_strings))))
print("{:20}{}".format("join_tuple:",
timeit.timeit(lambda: join_tuple(*short_strings))))
print("{:20}{}\n".format("join_from_list:",
timeit.timeit(lambda: join_from_list(short_strings))))
print("Long Strings")
print("{:20}{}".format("concat:",
timeit.timeit(lambda: concat(*long_strings))))
print("{:20}{}".format("concat_from_list:",
timeit.timeit(lambda: concat_from_list(long_strings))))
print("{:20}{}".format("join:",
timeit.timeit(lambda: join(*long_strings))))
print("{:20}{}".format("join_tuple:",
timeit.timeit(lambda: join_tuple(*long_strings))))
print("{:20}{}".format("join_from_list:",
timeit.timeit(lambda: join_from_list(long_strings))))
Output:
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>>
>>> test()
Short strings
concat: 0.5453461176251436
concat_from_list: 0.5185697357936024
join: 0.7099379456477868
join_tuple: 0.5900842397209949
join_from_list: 0.4177281794285359
Long Strings
concat: 2.002303591571888
concat_from_list: 1.8898819841869416
join: 1.5672863477837913
join_tuple: 1.4343144915087596
join_from_list: 1.231374639083505
So, joining from an already existing list is always fastest. Concatenating with +
is faster for individual items if they are short, but for long strings it is always slowest. I suspect the difference shown between concat
and concat_from_list
comes from the unpacking of the lists in the function call in the test code.
回答2:
As I understand it "".join(iterable_of_strings) is the preferred way to concatenate strings because it allows for optimizations that avoid having to rewrite the immutable object to memory more times than necessary.
You understand somewhat incorrectly. "".join(iterable_of_strings)
is the preferred way to concatenate an iterable of strings, for the reason you explained.
However, you don't have an iterable of strings. You just have three strings. The fastest way to concatenate three strings is to add them together with +
, or use .format()
or %
. This is because you in your case have to first create the iterable, and then join the strings, all to avoid the copying of some quite small strings.
.join()
doesn't become faster until you have so many strings that it makes for stupid code to use the other methods anyway. When that happens depends on what Python implementation you have, what version and how long the strings are, but we are generally talking about more than ten strings.
Although it's true not all implementations have fast concatenation, I've tested both CPython, PyPy and Jython, and all of them have concatenation faster or as fast for just a couple of strings.
In essence, you should use choose between +
and .join()
depending on code clarity up until the time your code runs. Then, if you care about speed: Profile and benchmark your code. Don't sit and guess.
Some timings: http://slides.colliberty.com/DjangoConEU-2013/#/step-40
With video explanation: http://youtu.be/50OIO9ONmks?t=18m30s
来源:https://stackoverflow.com/questions/20186261/why-is-concatenating-strings-running-faster-than-joining-them