Why is concatenating strings running faster than joining them? [duplicate]

旧时模样 提交于 2019-12-10 18:20:12

问题


As I understand it "".join(iterable_of_strings) is the preferred way to concatenate strings because it allows for optimizations that avoid having to rewrite the immutable object to memory more times than necessary.

Adding strings inside of an expression is reliably running faster than joining them for moderately large number of operations for me.

I get about 2.9-3.2 seconds of time on joined and 2.3-2.7 on added running this code with Python 3.3 on my laptop. I couldn't find a good answer Googling this. Could someone explain what might be going on or direct me to a good resource?

import uuid
import time

class mock:
    def __init__(self):
        self.name = "foo"
        self.address = "address"
        self.age = "age"
        self.primarykey = uuid.uuid4()

data_list = [mock() for x in range(2000000)]

def added():
    my_dict_list = {}
    t = time.time()
    new_dict = { item.primarykey: item.name + item.address + item.age for item in data_list }
    print(time.time() - t)

def joined():
    my_dict_list = {}
    t = time.time()
    new_dict = { item.primarykey: ''.join([item.name, item.address, item.age]) for item in data_list }
    print(time.time() - t)

joined()
added()

回答1:


The time difference you're seeing comes from creating the list to be passed to join. And while you can get a small speedup from using a tuple instead, it's still going to be slower than just concatenating with + when there are only a few short strings.

It would be different if you had an iterable of strings to start with, rather than an object with strings as attributes. Then you could call join directly on the iterable, rather than needing to build a new one for each call.

Here's some testing I did with the timeit module:

import timeit

short_strings = ["foo", "bar", "baz"]
long_strings = [s*1000 for s in short_strings]

def concat(a, b, c):
    return a + b + c

def concat_from_list(lst):
    return lst[0] + lst[1] + lst[2]

def join(a, b, c):
    return "".join([a, b, c])

def join_tuple(a, b, c):
    return "".join((a, b, c))

def join_from_list(lst):
    return "".join(lst)

def test():
    print("Short strings")
    print("{:20}{}".format("concat:",
                           timeit.timeit(lambda: concat(*short_strings))))
    print("{:20}{}".format("concat_from_list:",
                           timeit.timeit(lambda: concat_from_list(short_strings))))
    print("{:20}{}".format("join:",
                           timeit.timeit(lambda: join(*short_strings))))
    print("{:20}{}".format("join_tuple:",
                           timeit.timeit(lambda: join_tuple(*short_strings))))
    print("{:20}{}\n".format("join_from_list:",
                             timeit.timeit(lambda: join_from_list(short_strings))))
    print("Long Strings")
    print("{:20}{}".format("concat:",
                           timeit.timeit(lambda: concat(*long_strings))))
    print("{:20}{}".format("concat_from_list:",
                           timeit.timeit(lambda: concat_from_list(long_strings))))
    print("{:20}{}".format("join:",
                           timeit.timeit(lambda: join(*long_strings))))
    print("{:20}{}".format("join_tuple:",
                           timeit.timeit(lambda: join_tuple(*long_strings))))
    print("{:20}{}".format("join_from_list:",
                           timeit.timeit(lambda: join_from_list(long_strings))))

Output:

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>> 
>>> test()
Short strings
concat:             0.5453461176251436
concat_from_list:   0.5185697357936024
join:               0.7099379456477868
join_tuple:         0.5900842397209949
join_from_list:     0.4177281794285359

Long Strings
concat:             2.002303591571888
concat_from_list:   1.8898819841869416
join:               1.5672863477837913
join_tuple:         1.4343144915087596
join_from_list:     1.231374639083505

So, joining from an already existing list is always fastest. Concatenating with + is faster for individual items if they are short, but for long strings it is always slowest. I suspect the difference shown between concat and concat_from_list comes from the unpacking of the lists in the function call in the test code.




回答2:


As I understand it "".join(iterable_of_strings) is the preferred way to concatenate strings because it allows for optimizations that avoid having to rewrite the immutable object to memory more times than necessary.

You understand somewhat incorrectly. "".join(iterable_of_strings) is the preferred way to concatenate an iterable of strings, for the reason you explained.

However, you don't have an iterable of strings. You just have three strings. The fastest way to concatenate three strings is to add them together with +, or use .format() or %. This is because you in your case have to first create the iterable, and then join the strings, all to avoid the copying of some quite small strings.

.join() doesn't become faster until you have so many strings that it makes for stupid code to use the other methods anyway. When that happens depends on what Python implementation you have, what version and how long the strings are, but we are generally talking about more than ten strings.

Although it's true not all implementations have fast concatenation, I've tested both CPython, PyPy and Jython, and all of them have concatenation faster or as fast for just a couple of strings.

In essence, you should use choose between + and .join() depending on code clarity up until the time your code runs. Then, if you care about speed: Profile and benchmark your code. Don't sit and guess.

Some timings: http://slides.colliberty.com/DjangoConEU-2013/#/step-40

With video explanation: http://youtu.be/50OIO9ONmks?t=18m30s



来源:https://stackoverflow.com/questions/20186261/why-is-concatenating-strings-running-faster-than-joining-them

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!