问题
Nokogiri isn't grabbing anything beneath the iframe
tag.
doc.search("iframe")
returns only the iframe
tag. doc.search("body.content-frame")
returns empty. doc.errors
returns empty also. Why isn't Nokogiri registering the HTML beneath the iframe? How can I grab it?
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head></head>
<body onunload="clearMyTimeInterval()">
<iframe id="content-frame" frameborder="0" src="/sportsbook/betting-lines/baseball/2014-08-21/?range=day" onload="javascript:checkLoadedFrame(this);" style="background-color: rgb(34, 34, 34); height: 1875px;" name="content-frame" scrolling="no" border="0">
#document
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head></head>
<body class="content-frame">
#ETC.......
回答1:
That's because the contents of the iframe
are not part of the page. In fact, they are in a completely different location (note the src
attribute of the iframe
). You'll have to fetch that content separately, which is how a browser would do it.
回答2:
Here is code that handles it:
page = Mechanize.new.get "http://page_u_need"
page.iframe_with(id: 'beatles').content
来源:https://stackoverflow.com/questions/25436818/nokogiri-scraping-misses-html