tokenize a string keeping delimiters in Python

后端 未结 5 981
无人共我
无人共我 2021-02-05 10:32

Is there any equivalent to str.split in Python that also returns the delimiters?

I need to preserve the whitespace layout for my output after processing som

5条回答
  •  醉酒成梦
    2021-02-05 11:05

    the re module provides this functionality:

    >>> import re
    >>> re.split('(\W+)', 'Words, words, words.')
    ['Words', ', ', 'words', ', ', 'words', '.', '']
    

    (quoted from the Python documentation).

    For your example (split on whitespace), use re.split('(\s+)', '\tThis is an example').

    The key is to enclose the regex on which to split in capturing parentheses. That way, the delimiters are added to the list of results.

    Edit: As pointed out, any preceding/trailing delimiters will of course also be added to the list. To avoid that you can use the .strip() method on your input string first.

提交回复
热议问题