How to get the raw HTML source code for a page by using Ruby or Nokogiri?

蓝咒 提交于 2019-12-04 12:53:33

Don't use Nokogiri at all if you want the raw source of a web page. Just fetch the web page directly as a string, and then do not feed that to Nokogiri. For example:

require 'open-uri'
html = open('http://phrogz.net').read
puts html.length #=> 8461
puts html        #=> ...raw source of the page...

If, on the other hand, you want the post-JavaScript-modified contents of a page (such as an AJAX library that executes JavaScript code to fetch new content and change the page), then you can't use Nokogiri. You need to use Ruby to control a web browser (e.g. read up on Selenium or Watir).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!