I\'m using the RE expression in python and trying to split a chunk of text by period and by exclamation mark. However when I split it, I get a \"None\" in the result
<
it is happening because after every exclamation mark there's a space character which is returned as None
here.
You can use filter to remove these None
's.
>>> import re
>>> a = "This is my text...I want it to split by periods. I also want it to split \
by exclamation marks! Is that so much to ask?"
>>> filter(lambda x:x!=None, re.split('((?<=\w)\.(?!\..))|(!)',a))
['This is my text...I want it to split by periods', '.', ' I also want it to split by exclamation marks', '!', ' Is that so much to ask?']
Try the following:
re.split(r'((?<=\w)\.(?!\..)|!)', a)
You get the None
because you have two capturing groups, and all groups are included as a part of the re.split()
result.
So any time you match a .
the second capture group is None
, and any time you match a !
the first capture group is None
.
Here is the result:
['This is my text...I want it to split by periods',
'.',
' I also want it to split by exclamation marks',
'!',
' Is that so much to ask?']
If you don't want to include '.'
and '!'
in your result, just remove the parentheses that surround the entire expression: r'(?<=\w)\.(?!\..)|!'
Here's a simpler expression (any period not followed or preceeded by a period) with the outer capturing group around the whole or |
clause to avoid the None
, not just the first part:
re.split(r'((?<!\.)\.(?!\.)|!)', a)
# Result:
# ['This is my text...I want it to split by periods',
# '.',
# ' I also want it to split by exclamation marks',
# '!',
# ' Is that so much to ask?']