问题
I'm parsing an HTML file into a well-formed XML document using NekoHTML parser. However I can't quite figure out the GPath so that I can identify the table that has the "Settings" string.
def parser = new org.cyberneko.html.parsers.SAXParser()
parser.setFeature('http://xml.org/sax/features/namespaces', false)
def html =
'''
<html>
<title>Hiya!</title>
</html>
<body>
<table>
<tr>
<th colspan='3'>Settings</th>
<td>First cell r1</td>
<td>Second cell r1</td>
</tr>
</table>
<table>
<tr>
<th colspan='3'>Other Settings</th>
<td>First cell r2</td>
<td>Second cell r2</td>
</tr>
</table>
'''
def slurper = new XmlSlurper(parser)
def page = slurper.parseText(html)
In this sample, the first table should be selected so that I can iterate over other row values in it. Can someone help me with this GPath please?
EDIT: Side question - why does
println page.HTML.HEAD.TITLE
print an empty string, shouldn't it return the title?
回答1:
To get the table with 'Settings' in the header, you should be able to do:
def settingsTableNode = page.BODY.TABLE.find { table -> table.TBODY.TR.TH.text() == 'Settings' }
page
points to the root of the document, so you don't need theHTML
. All you should need to do is:println page.HEAD.TITLE
来源:https://stackoverflow.com/questions/9260461/gpath-to-find-if-a-table-header-contains-a-matching-string