I am currently using OpenURI to download a file in Ruby. Unfortunately, it seems impossible to get the HTTP headers without downloading the full file:
open(base_url,
:content_length_proc => lambda {|t|
if t && 0 < t
pbar = ProgressBar.create(:total => t)
end
},
:progress_proc => lambda {|s|
pbar.progress = s if pbar
}) {|io|
puts io.size
puts io.meta['content-disposition']
}
Running the code above shows that it first downloads the full file and only then prints the header I need.
Is there a way to get the headers before the full file is downloaded, so I can cancel the download if the headers are not what I expect them to be?
You can use Net::HTTP for this matter, for example:
require 'net/http'
http = Net::HTTP.start('stackoverflow.com')
resp = http.head('/')
resp.each { |k, v| puts "#{k}: #{v}" }
http.finish
Another example, this time getting the header of the wonderful book, Object Orient Programming With ANSI-C:
require 'net/http'
http = Net::HTTP.start('www.planetpdf.com')
resp = http.head('/codecuts/pdfs/ooc.pdf')
resp.each { |k, v| puts "#{k}: #{v}" }
http.finish
It seems what I wanted is not possible to archieve using OpenURI, at least not, as I said, without loading the whole file first.
I was able to do what I wanted using Net::HTTP's request_get
Here an example:
http.request_get('/largefile.jpg') {|response|
if (response['content-length'] < max_length)
response.read_body do |str| # read body now
# save to file
end
end
}
Note that this only works when using a block, doing it like:
response = http.request_get('/largefile.jpg')
the body will already be read.
Rather than use Net::HTTP, which can be like digging a pool on the beach using a sand shovel, you can use a number of the HTTP clients for Ruby and clean up the code.
Here's a sample using HTTParty:
require 'httparty'
resp = HTTParty.head('http://example.org')
resp.headers
# => {"accept-ranges"=>["bytes"], "cache-control"=>["max-age=604800"], "content-type"=>["text/html"], "date"=>["Thu, 02 Mar 2017 18:52:42 GMT"], "etag"=>["\"359670651\""], "expires"=>["Thu, 09 Mar 2017 18:52:42 GMT"], "last-modified"=>["Fri, 09 Aug 2013 23:54:35 GMT"], "server"=>["ECS (oxr/83AB)"], "x-cache"=>["HIT"], "content-length"=>["1270"], "connection"=>["close"]}
At that point it's easy to check the size of the document:
resp.headers['content-length'] # => "1270"
Unfortunately, the HTTPd you're talking to might not know how big the content will be; In order to respond quickly servers don't necessarily calculate the size of dynamically generated output, which would take almost as long and be almost as CPU intensive as actually sending it, so relying on the "content-length" value might be buggy.
The issue with Net::HTTP is it won't automatically handle redirects, so then you have to add additional code. Granted, that code is supplied in the documentation, but the code keeps growing as you need to do more things, until you've ended up writing yet another http client (YAHC). So, avoid that and use an existing wheel.
来源:https://stackoverflow.com/questions/17454956/how-to-get-http-headers-before-downloading-with-rubys-openuri