Match everything delimited by another regex?

前端 未结 2 952
再見小時候
再見小時候 2021-01-26 21:31

I\'m currently trying to make a regex that will find all the sentences in a block of text, and so far I\'ve got this;

(?=(?

        
相关标签:
2条回答
  • 2021-01-26 21:37

    (Moved from your closed newer question)
    In your case, the lookbehinds should come before the periods.
    Condensing your expression, it is

    Update - Between it you could just split discarding delimiters

     # (?:(?<!mr)(?<!mrs)\.|\?|!)+
    
     (?:
          (?<! mr )
          (?<! mrs )
          \.
       |  \?
       |  !
     )+
    

    Or, split keeping delimiters

     # ((?:(?<!mr)(?<!mrs)\.|\?|!)+)
    
     (
          (?:
               (?<! mr )
               (?<! mrs )
               \.
            |  \?
            |  !
          )+
     )
    
    0 讨论(0)
  • 2021-01-26 21:38

    What about this:

    import re
    
    pattern = r'(?=(?<!mr)\.|(?<!mrs)\.|\?|!)+' # I'm assuming this does what you say it does :)
    text_block = """long block of sentences"""
    
    sentences = re.split(pattern, text_block)
    

    sentences will be a list containing the resulting substrings. re.split will split text_block up into different elements of the returned list. It splits at each point where pattern matches.

    Read about re here:

    https://docs.python.org/2/howto/regex.html

    EDIT(data imported from your closed newer question):

    If you are getting the symbols like ?, ! etc. captured into your returned list aswell, you should try removing the outer parens, like this:

    re.split(r"\.(?<!mr)|\.(?<!mrs)|\?|!", somestring)
    

    Ex:

    sentences = [s for s in re.split(r"\.(?<!mr)|\.(?<!mrs)|\?|!", somestring) if s]
    
    0 讨论(0)
提交回复
热议问题