问题
For tedious reasons to do with Hpricot, I need to write a function that is passed a URL, and returns the whole contents of the page as a single string.
I'm close. I know I need to use OpenURI, and it should look something like this:
require 'open-uri'
open(url) {
# do something mysterious here to get page_string
}
puts page_string
Can anyone suggest what I need to add?
回答1:
The open
method passes an IO
representation of the resource to your block when it yields. You can read from it using the IO#read method
open([mode [, perm]] [, options]) [{|io| ... }]
open(path) { |io| data = io.read }
回答2:
You can do the same without OpenURI:
require 'net/http'
require 'uri'
def open(url)
Net::HTTP.get(URI.parse(url))
end
page_content = open('http://www.google.com')
puts page_content
Or, more succinctly:
Net::HTTP.get(URI.parse('http://www.google.com'))
回答3:
require 'open-uri'
open(url) do |f|
page_string = f.read
end
See also the documentation of IO class
回答4:
I was also very confused what to use for better performance and speedy results. I ran a benchmark for both to make it more clear:
require 'benchmark'
require 'net/http'
require "uri"
require 'open-uri'
url = "http://www.google.com"
Benchmark.bm do |x|
x.report("net-http:") { content = Net::HTTP.get_response(URI.parse(url)).body if url }
x.report("open-uri:") { open(url){|f| content = f.read } if url }
end
Its result is:
user system total real
net-http: 0.000000 0.000000 0.000000 ( 0.097779)
open-uri: 0.030000 0.010000 0.040000 ( 0.864526)
I'd like to say that it depends on what your requirement is and how you want to process.
回答5:
To make code a little clearer, the OpenURI open
method will return the value returned by the block, so you can assign open
's return value to your variable. For example:
xml_text = open(url) { |io| io.read }
回答6:
Try the following instead:
require 'open-uri'
content = URI(your_url).read
回答7:
require 'open-uri'
open(url) {|f| #url must specify the protocol
str = f.read()
}
来源:https://stackoverflow.com/questions/3193538/retrieve-contents-of-url-as-string