RegEx Get string between two strings that has line breaks

后端 未结 2 449
借酒劲吻你
借酒劲吻你 2020-11-30 15:01

I have the following test (formatted just like below):


      My Class: TEST DATA
Test Section:
相关标签:
2条回答
  • 2020-11-30 15:34

    Get the matched group from index 1

    Test Section:([\S\s]*)</td>
    

    Live demo

    Note: change the last part as per your need.

    sample code:

    import re
    p = re.compile(ur'Test Section:([\S\s]*)</td>', re.MULTILINE)
    test_str = u"..."
    
    re.findall(p, test_str)
    

    Pattern Explanation:

      Test Section:            'Test Section:'
      (                        group and capture to \1:
        [\S\s]*                  any character of: non-whitespace (all
                                 but \n, \r, \t, \f, and " "), whitespace
                                 (\n, \r, \t, \f, and " ") (0 or more
                                 times (matching the most amount
                                 possible))
      )                        end of \1
      </td>                    '</td>'
    
    0 讨论(0)
  • 2020-11-30 15:48

    Use re.S or re.DOTALL flags. Or prepend the regular expression with (?s) to make . matches all character (including newline).

    Without the flags, . does not match newline.

    (?s)(?<=Test)(.*?)(?=</td>)
    

    Example:

    >>> s = '''<td scope="row" align="left">
    ...       My Class: TEST DATA<br>
    ...       Test Section: <br>
    ...       MY SECTION<br>
    ...       MY SECTION 2<br>
    ...     </td>'''
    >>>
    >>> import re
    >>> re.findall('(?<=Test)(.*?)(?=</td>)', s)  # without flags
    []
    >>> re.findall('(?<=Test)(.*?)(?=</td>)', s, flags=re.S)
    [' Section: <br>\n      MY SECTION<br>\n      MY SECTION 2<br>\n    ']
    >>> re.findall('(?s)(?<=Test)(.*?)(?=</td>)', s)
    [' Section: <br>\n      MY SECTION<br>\n      MY SECTION 2<br>\n    ']
    
    0 讨论(0)
提交回复
热议问题