Python sum, why not strings? [closed]

前端未结

关注

 8  1817

臣服心动

相关标签:

8条回答

慢半拍i

2020-11-29 03:57
Edit: Moved the parts about immutability to history.

Basically, its a question of preallocation. When you use a statement such as
```
sum(["a", "b", "c", ..., ])
```
and expect it to work similar to a reduce statement, the code generated looks something like
```
v1 = "" + "a" # must allocate v1 and set its size to len("") + len("a")
v2 = v1 + "b" # must allocate v2 and set its size to len("a") + len("b")
...
res = v10000 + "$" # must allocate res and set its size to len(v9999) + len("$")
```
In each of these steps a new string is created, which for one might give some copying overhead as the strings are getting longer and longer. But that’s maybe not the point here. What’s more important, is that every new string on each line must be allocated to it’s specific size (which. I don’t know it it must allocate in every iteration of the reduce statement, there might be some obvious heuristics to use and Python might allocate a bit more here and there for reuse – but at several points the new string will be large enough that this won’t help anymore and Python must allocate again, which is rather expensive.

A dedicated method like join, however has the job to figure out the real size of the string before it starts and would therefore in theory only allocate once, at the beginning and then just fill that new string, which is much cheaper than the other solution.
0 讨论(0)
发布评论:

提交评论
- 加载中...
佛祖请我去吃肉

2020-11-29 04:04

Short answer: Efficiency.

Long answer: The sum function has to create an object for each partial sum.

Assume that the amount of time required to create an object is directly proportional to the size of its data. Let N denote the number of elements in the sequence to sum.

doubles are always the same size, which makes sum's running time O(1)×N = O(N).

int (formerly known as long) is arbitary-length. Let M denote the absolute value of the largest sequence element. Then sum's worst-case running time is lg(M) + lg(2M) + lg(3M) + ... + lg(NM) = N×lg(M) + lg(N!) = O(N log N).

For str (where M = the length of the longest string), the worst-case running time is M + 2M + 3M + ... + NM = M×(1 + 2 + ... + N) = O(N²).

Thus, summing strings would be much slower than summing numbers.

str.join does not allocate any intermediate objects. It preallocates a buffer large enough to hold the joined strings, and copies the string data. It runs in O(N) time, much faster than sum.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2

热议问题