问题
I have an webpage which has the similar kind of html
format as below:
<form name="test">
<td> .... </td>
.
.
.
<td> <A HREF="http://www.edu/st/file.html">alo</A> </td>
<td> <A HREF="http://www.dom/st/file.html">foo</A> </td>
<td> bla bla </td>
</form>
Now, I know only the value bla bla
, base on the value can we track or find the 3rd last
.. value(which is here alo
)? I can track those,with the help of HREF
values,but the HREF
values are not fixed always, they can be anything anytime.
回答1:
Extracting every <td>
from an HTML document is easy, but it's not a foolproof way to navigate the DOM. However, given the limitations of the sample HTML, here's a solution. I doubt it'll work in a real-world situation though.
Mechanize uses Nokogiri internally for its heavy lifting so doing require 'nokogiri'
isn't necessary if you've already required Mechanize.
require 'nokogiri'
doc = Nokogiri::HTML::DocumentFragment.parse(<<EOT)
<td> <A HREF="http://www.edu/st/file.html">alo</A> </td>
<td> <A HREF="http://www.dom/st/file.html">foo</A> </td>
<td> bla bla </td>
EOT
doc.search('td')[-3].at('a')['href']
=> "http://www.edu/st/file.html"
How to get the Nokogiri document from the Mechanize "agent" is left as an exercise for the user.
回答2:
see http://nokogiri.org/
it helps you to parse html code and then find the elements via selectors
来源:https://stackoverflow.com/questions/14467164/is-it-possible-to-find-the-td-td-text-when-any-of-the-td-td-value