问题
More specifically I want to split a string on any non alpha-numeric character but in the case that the delimiter is not a white space I want to keept it. That is, to the input:
my_string = 'Hey, I\'m 9/11 7-11'
I want to get:
['Hey' , ',' , 'I' , "'" , 'm', '9' , '/' , '11', '7' , '-' , '11']
Without no whitespace as a list element.
I have tried the following:
re.split('([/\'\-_,.;])|\s', my_string)
But outputs:
['Hey', ',', '', None, 'I', "'", 'm', None, '9', '/', '11', None, '7', '-', '11']
How do I solve this without 'unnecessary' iterations?
Also I have some trouble with escaping the backslash character, since '\\\\'
does not seem to be working, any ideas on how to also solve this?
Thanks a lot.
回答1:
You may use
import re
my_string = "Hey, I'm 9/11 7-11"
print(re.findall(r'\w+|[^\w\s]', my_string))
# => ['Hey', ',', 'I', "'", 'm', '9', '/', '11', '7', '-', '11']
See the Python demo
The \w+|[^\w\s]
regex matches either 1+ word chars (letters, digits, _
symbols) or a single character other than a word and whitespace char.
BTW, to match a backslash with a regex, you need to use \\
in a raw string literal (r'\\'
) or 4 backslashes in a regular one ('\\\\'
). It is recommended to use raw string literals to define a regex pattern in Python.
来源:https://stackoverflow.com/questions/43620776/how-do-i-split-a-string-on-different-delimiters-but-keeping-on-the-output-some