Efficiently split a string using multiple separators and retaining each separator?

前端未结

关注

 9  1358

野趣味

I need to split strings of data using each character from string.punctuation and string.whitespace as a separator.

Furthermore, I need for the

相关标签:

9条回答

鱼传尺愫

2021-02-02 11:34

For any arbitrary collection of separators:

def separate(myStr, seps):
    answer = []
    temp = []
    for char in myStr:
        if char in seps:
            answer.append(''.join(temp))
            answer.append(char)
            temp = []
        else:
            temp.append(char)
    answer.append(''.join(temp))
    return answer

In [4]: print separate("Now is the winter of our discontent", set(' '))
['Now', ' ', 'is', ' ', 'the', ' ', 'winter', ' ', 'of', ' ', 'our', ' ', 'discontent']

In [5]: print separate("Now, really - it is the winter of our discontent", set(' ,-'))
['Now', ',', '', ' ', 'really', ' ', '', '-', '', ' ', 'it', ' ', 'is', ' ', 'the', ' ', 'winter', ' ', 'of', ' ', 'our', ' ', 'discontent']

Hope this helps

0 讨论(0)

闹比i

2021-02-02 11:36
Depending on the text you are dealing with, you may be able to simplify your concept of delimiters to "anything other than letters and numbers". If this will work, you can use the following regex solution:
```
re.findall(r'[a-zA-Z\d]+|[^a-zA-Z\d]', text)
```
This assumes that you want to split on each individual delimiter character even if they occur consecutively, so 'foo..bar' would become ['foo', '.', '.', 'bar']. If instead you expect ['foo', '..', 'bar'], use [a-zA-Z\d]+|[^a-zA-Z\d]+ (only difference is adding + at the very end).
0 讨论(0)
发布评论:

提交评论
- 加载中...

挽巷

2021-02-02 11:42

from string import punctuation, whitespace

s = "..test. and stuff"

f = lambda s, c: s + ' ' + c + ' ' if c in punctuation else s + c
l =  sum([reduce(f, word).split() for word in s.split()], [])

print l

0 讨论(0)

上一页 1 2