问题
For example, I have a bunch of <tr>
tags I'd like to collect. I need to split each of these tags into individual elements, for easier parsing on my part.
Is this possible?
An example of the markup:
<tr class="first-in-year">
<td class="year">2011</td>
<td class="img"><a href="/battlefield-3/61-27006/"><img src=
"http://media.giantbomb.com/uploads/6/63038/1700748-bf3_thumb.jpg" alt=""></a></td>
<td class="title">
<a href="/battlefield-3/61-27006/">Battlefield 3</a>
<p class="deck">Battlefield 3 is DICE's next installment in the franchise and
will be on PC, PS3 and Xbox 360. The game will feature jets, prone, a
single-player and co-op campaign, and 64-player multiplayer (on PC). It's due out
in Fall of 2011.</p>
</td>
<td class="date">Expected: Q4 2011</td>
<td><a href="/pc/60-94/" class="PC">PC</a>, <a href="/xbox-360/60-20/" class=
"X360">X360</a>, <a href="/playstation-3/60-35/" class="PS3">PS3</a></td>
</tr>
<tr>
<td class="year"></td>
<td class="img"><a href="/forza-motorsport-4/61-33400/"><img src=
"http://media.giantbomb.com/uploads/0/1992/1654849-forza4_thumb.jpg" alt=
""></a></td>
<td class="title">
<a href="/forza-motorsport-4/61-33400/">Forza Motorsport 4</a>
<p class="deck">The next installment of Turn 10's racing franchise slated for
release in Fall 2011. It is set to feature 16 player online races, dynamic race
conditions, cars from over 80 manufacturers, and compatibility with Kinect, both
on and off the racetrack.</p>
</td>
<td class="date">Expected: Oct 2011</td>
<td><a href="/xbox-360/60-20/" class="X360">X360</a></td>
</tr>
<tr>
<td class="year"></td>
<td class="img"><a href="/max-payne-3/61-23398/"><img src=
"http://media.giantbomb.com/uploads/0/1400/938434-custom_1237811317319_mp3_poster_thumb.jpg"
alt=""></a></td>
<td class="title">
<a href="/max-payne-3/61-23398/">Max Payne 3</a>
<p class="deck">The long awaited third instalment in Remedy's beloved series, in
which an aging Max Payne faces one final chance to redeem himself.</p>
</td>
<td class="date">Expected: 2011</td>
<td><a href="/pc/60-94/" class="PC">PC</a>, <a href="/playstation-3/60-35/" class=
"PS3">PS3</a>, <a href="/xbox-360/60-20/" class="X360">X360</a></td>
</tr>
So I would have three elements here for this example. :)
回答1:
You can't split it into multiple HTML documents on the tag if that's what you mean. You can select the individual TD elements and parse those separately.
The XPath selector //td
will select all elements which you can pass into a parsing method.
HtmlAgilityPack.HtmlDocument doc = LoadHtmlHowever();
doc.DocumentNode.SelectNodes("//td");
来源:https://stackoverflow.com/questions/5970564/can-i-use-htmlagilitypack-to-split-an-html-document-on-a-certain-tag