I have the following test (formatted just like below):
My Class: TEST DATA
Test Section:
-
Get the matched group from index 1
Test Section:([\S\s]*)</td>
Live demo
Note: change the last part as per your need.
sample code:
import re
p = re.compile(ur'Test Section:([\S\s]*)</td>', re.MULTILINE)
test_str = u"..."
re.findall(p, test_str)
Pattern Explanation:
Test Section: 'Test Section:'
( group and capture to \1:
[\S\s]* any character of: non-whitespace (all
but \n, \r, \t, \f, and " "), whitespace
(\n, \r, \t, \f, and " ") (0 or more
times (matching the most amount
possible))
) end of \1
</td> '</td>'
讨论(0)
-
Use re.S or re.DOTALL flags. Or prepend the regular expression with (?s)
to make .
matches all character (including newline).
Without the flags, .
does not match newline.
(?s)(?<=Test)(.*?)(?=</td>)
Example:
>>> s = '''<td scope="row" align="left">
... My Class: TEST DATA<br>
... Test Section: <br>
... MY SECTION<br>
... MY SECTION 2<br>
... </td>'''
>>>
>>> import re
>>> re.findall('(?<=Test)(.*?)(?=</td>)', s) # without flags
[]
>>> re.findall('(?<=Test)(.*?)(?=</td>)', s, flags=re.S)
[' Section: <br>\n MY SECTION<br>\n MY SECTION 2<br>\n ']
>>> re.findall('(?s)(?<=Test)(.*?)(?=</td>)', s)
[' Section: <br>\n MY SECTION<br>\n MY SECTION 2<br>\n ']
讨论(0)
- 热议问题