Joining strings. Generator or list comprehension?

后端未结

关注

 3  1722

Consider the problem of extracting alphabets from a huge string.

One way to do is

\'\'.join([c for c in hugestring if c.isalpha()])

相关标签:

3条回答

盖世英雄少女心

2020-12-03 20:06
When you call str.join(gen) where gen is a generator, Python does the equivalent of list(gen) before going on to examine the length of the resulting sequence.

Specifically, if you look at the code implementing str.join in CPython, you'll see this call:
```
    fseq = PySequence_Fast(seq, "can only join an iterable");
```
The call to PySequence_Fast converts the seq argument into a list if it wasn't a list or tuple already.

So, the two versions of your call are handled almost identically. In the list comprehension, you're building the list yourself and passing it into join. In the generator expression version, the generator object you pass in gets turned into a list right at the start of join, and the rest of the code operates the same for both versions..
0 讨论(0)
发布评论:

提交评论
- 加载中...
臣服心动

2020-12-03 20:07

join() does not need to be implemented as a sequential appending of elements of the sequence to a longer and longer accumulated string (which would indeed be very slow for long sequences); it just needs to produce the same result. So join() is probably just appending characters to some internal memory buffer, and creating a string from it at the end. The list comprehension construct, on the other hand, needs to first construct the list (by traversing hugestring's generator), and only then let join() begin its work.

Also, I doubt that join() looks at the list's length, since it can't know that each element is a single character (in most cases, it won't be) - it probably just obtains a generator from the list.

0 讨论(0)
发布评论:

提交评论
- 加载中...
无人共我

2020-12-03 20:07
At least on my machine, the list comprehension is faster for the case I tested, likely due to ''.join being able to optimize the memory allocation. It likely just depends on the specific example you're testing (e.g., if the condition you're testing occurs less frequently, the price CPython pays for not knowing length ahead of time may be smaller):
```
In [18]: s = ''.join(np.random.choice(list(string.printable), 1000000))

In [19]: %timeit ''.join(c for c in s if c.isalpha())
10 loops, best of 3: 69.1 ms per loop

In [20]: %timeit ''.join([c for c in s if c.isalpha()])
10 loops, best of 3: 61.8 ms per loop
```
0 讨论(0)
发布评论:

提交评论
- 加载中...