Efficiently split a string using multiple separators and retaining each separator?

前端 未结 9 1355
野趣味
野趣味 2021-02-02 10:44

I need to split strings of data using each character from string.punctuation and string.whitespace as a separator.

Furthermore, I need for the

9条回答
  •  星月不相逢
    2021-02-02 11:29

    import re
    import string
    
    p = re.compile("[^{0}]+|[{0}]+".format(re.escape(
        string.punctuation + string.whitespace)))
    
    print p.findall("Now is the winter of our discontent")
    

    I'm no big fan of using regexps for all problems, but I don't think you have much choice in this if you want it fast and short.

    I'll explain the regexp since you're not familiar with it:

    • [...] means any of the characters inside the square brackets
    • [^...] means any of the characters not inside the square brackets
    • + behind means one or more of the previous thing
    • x|y means to match either x or y

    So the regexp matches 1 or more characters where either all must be punctuation and whitespace, or none must be. The findall method finds all non-overlapping matches of the pattern.

提交回复
热议问题