Retrieve contents of URL as string

╄→гoц情女王★ 提交于 2019-11-29 00:08:35

问题


For tedious reasons to do with Hpricot, I need to write a function that is passed a URL, and returns the whole contents of the page as a single string.

I'm close. I know I need to use OpenURI, and it should look something like this:

require 'open-uri'
open(url) {
  # do something mysterious here to get page_string
}
puts page_string

Can anyone suggest what I need to add?


回答1:


The open method passes an IO representation of the resource to your block when it yields. You can read from it using the IO#read method

open([mode [, perm]] [, options]) [{|io| ... }] 
open(path) { |io| data = io.read }



回答2:


You can do the same without OpenURI:

require 'net/http'
require 'uri'

def open(url)
  Net::HTTP.get(URI.parse(url))
end

page_content = open('http://www.google.com')
puts page_content

Or, more succinctly:

Net::HTTP.get(URI.parse('http://www.google.com'))



回答3:


require 'open-uri'
open(url) do |f|
  page_string = f.read
end

See also the documentation of IO class




回答4:


I was also very confused what to use for better performance and speedy results. I ran a benchmark for both to make it more clear:

require 'benchmark'
require 'net/http'
require "uri"
require 'open-uri'

url = "http://www.google.com"
Benchmark.bm do |x|
  x.report("net-http:")   { content = Net::HTTP.get_response(URI.parse(url)).body if url }
  x.report("open-uri:")   { open(url){|f| content =  f.read } if url }
end

Its result is:

              user     system      total        real
net-http:  0.000000   0.000000   0.000000 (  0.097779)
open-uri:  0.030000   0.010000   0.040000 (  0.864526)

I'd like to say that it depends on what your requirement is and how you want to process.




回答5:


To make code a little clearer, the OpenURI open method will return the value returned by the block, so you can assign open's return value to your variable. For example:

xml_text = open(url) { |io| io.read }



回答6:


Try the following instead:

require 'open-uri' 
content = URI(your_url).read



回答7:



require 'open-uri'
open(url) {|f|  #url must specify the protocol
str = f.read()
}


来源:https://stackoverflow.com/questions/3193538/retrieve-contents-of-url-as-string

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!