RE split multiple arguments | (or) returns none python

后端 未结 3 1498
南旧
南旧 2021-02-15 12:35

I\'m using the RE expression in python and trying to split a chunk of text by period and by exclamation mark. However when I split it, I get a \"None\" in the result

<         


        
相关标签:
3条回答
  • 2021-02-15 13:27

    it is happening because after every exclamation mark there's a space character which is returned as None here.

    You can use filter to remove these None's.

    >>> import re
    >>> a = "This is my text...I want it to split by periods. I also want it to split \
    by exclamation marks! Is that so much to ask?"
    
    >>> filter(lambda x:x!=None, re.split('((?<=\w)\.(?!\..))|(!)',a))
    
    ['This is my text...I want it to split by periods', '.', ' I also want it to split by exclamation marks', '!', ' Is that so much to ask?']
    
    0 讨论(0)
  • 2021-02-15 13:29

    Try the following:

    re.split(r'((?<=\w)\.(?!\..)|!)', a)
    

    You get the None because you have two capturing groups, and all groups are included as a part of the re.split() result.

    So any time you match a . the second capture group is None, and any time you match a ! the first capture group is None.

    Here is the result:

    ['This is my text...I want it to split by periods',
     '.',
     ' I also want it to split by exclamation marks',
     '!',
     ' Is that so much to ask?']
    

    If you don't want to include '.' and '!' in your result, just remove the parentheses that surround the entire expression: r'(?<=\w)\.(?!\..)|!'

    0 讨论(0)
  • 2021-02-15 13:30

    Here's a simpler expression (any period not followed or preceeded by a period) with the outer capturing group around the whole or | clause to avoid the None, not just the first part:

    re.split(r'((?<!\.)\.(?!\.)|!)', a)
    
    # Result:
    # ['This is my text...I want it to split by periods', 
    #  '.', 
    #  ' I also want it to split by exclamation marks', 
    #  '!', 
    #  ' Is that so much to ask?']
    
    0 讨论(0)
提交回复
热议问题