I need to split strings of data using each character from string.punctuation
and string.whitespace
as a separator.
Furthermore, I need for the
import re
import string
p = re.compile("[^{0}]+|[{0}]+".format(re.escape(
string.punctuation + string.whitespace)))
print p.findall("Now is the winter of our discontent")
I'm no big fan of using regexps for all problems, but I don't think you have much choice in this if you want it fast and short.
I'll explain the regexp since you're not familiar with it:
[...]
means any of the characters inside the square brackets[^...]
means any of the characters not inside the square brackets+
behind means one or more of the previous thingx|y
means to match either x
or y
So the regexp matches 1 or more characters where either all must be punctuation and whitespace, or none must be. The findall
method finds all non-overlapping matches of the pattern.