open-uri

ruby reading files from S3 with open-URI

时光总嘲笑我的痴心妄想 提交于 2019-12-08 07:31:04
问题 I'm having some problems reading a file from S3. I want to be able to load the ID3 tags remotely, but using open-URI doesn't work, it gives me the following error: ruby-1.8.7-p302 > c=TagLib2::File.new(open(URI.parse("http://recordtemple.com.s3.amazonaws.com/music/745/original/The%20Stranger.mp3?1292096514"))) TypeError: can't convert Tempfile into String from (irb):8:in `initialize' from (irb):8:in `new' from (irb):8 However, if i download the same file and put it on my desktop (ie no need

open-uri and sax parsing for a giant xml document

被刻印的时光 ゝ 提交于 2019-12-07 23:33:22
问题 I need to connect to an external XML file to download and process (300MB+). Then run through the XML document and save elements in the database. I am already doing this no problem on a production server with Saxerator to be gentle on memory. It works great. Here is my issue now -- I need to use open-uri (though there could be alternative solutions?) to grab the file to parse through. This problem is that open-uri has to load the whole file before anything starts parsing, which defeats the

Stop file write if file size exceeds 500KB ruby on rails

倖福魔咒の 提交于 2019-12-06 14:32:33
问题 How can I stop file writing ( upload form remote url ) when file size exceeds 500KB ? I am using following code to upload a remote file require 'open-uri' open('temp/demo.doc', 'wb') do |file| file << open('http://example.com/demo.doc').read end this code is working properly and I am able to get files in temp folder. Now I want if filesize exceeds 500KB then it should stop writing file. In other words I want only 500KB of file if it is more than 500KB 回答1: IO#read, takes a bytes argument, so

Ruby 2 Upgrade Breaks Nokogiri and/or open-uri Encoding?

好久不见. 提交于 2019-12-06 05:42:13
I have a mystery to solve when upgrading our Rails3.2 Ruby 1.9 app to a Rails3.2 Ruby 2.1.2 one. Nokogiri seems to break, in that it changes its behavior using open-uri. No gem versions are changed, just the ruby version (this is all on OSX Mavericks, using brew, gcc4 etc). Steps to reproduce: $ ruby -v ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-darwin13.1.0] $ rails console Connecting to database specified by database.yml Loading development environment (Rails 3.2.18) > feed = Nokogiri::XML(open(URI.encode("http://anyblog.wordpress.org/feed/"))) => #(Document:0x3fcb82f08448 { name =

Using Open-URI to fetch XML and the best practice in case of problems with a remote url not returning/timing out?

给你一囗甜甜゛ 提交于 2019-12-06 05:19:10
问题 Current code works as long as there is no remote error: def get_name_from_remote_url cstr = "http://someurl.com" getresult = open(cstr, "UserAgent" => "Ruby-OpenURI").read doc = Nokogiri::XML(getresult) my_data = doc.xpath("/session/name").text # => 'Fred' or 'Sam' etc return my_data end But, what if the remote URL times out or returns nothing? How I detect that and return nil, for example? And, does Open-URI give a way to define how long to wait before giving up? This method is called while

Adjusting timeouts for Nokogiri connections

跟風遠走 提交于 2019-12-06 04:41:04
Why nokogiri waits for couple of secongs (3-5) when the server is busy and I'm requesting pages one by one, but when these request are in a loop, nokogiri does not wait and throws the timeout message. I'm using timeout block wrapping the request, but nokogiri does not wait for that time at all. Any suggested procedure on this? # this is a method from the eng class def get_page(url,page_type) begin timeout(10) do # Get a Nokogiri::HTML::Document for the page we’re interested in... @@doc = Nokogiri::HTML(open(url)) end rescue Timeout::Error puts "Time out connection request" raise end end # this

How to Process Items in an Array in Parallel using Ruby (and open-uri)

老子叫甜甜 提交于 2019-12-05 17:47:31
问题 I am wondering how i can go about opening multiple concurrent connections using open-uri? i THINK I need to use threading or fibers some how but i'm not sure. Example code: def get_doc(url) begin Nokogiri::HTML(open(url).read) rescue Exception => ex puts "Failed at #{Time.now}" puts "Error: #{ex}" end end array_of_urls_to_process = [......] # How can I iterate over items in the array in parallel (instead of one at a time?) array_of_urls_to_process.each do |url| x = get_doc(url) do_something(x

Ruby Proxy Authentication GET/POST with OpenURI or net/http

天涯浪子 提交于 2019-12-05 07:22:09
I'm using ruby 1.9.3 and trying to use open-uri to get a url and try posting using Net:HTTP Im trying to use proxy authentication for both: Trying to do a POST request with net/http : require 'net/http' require 'open-uri' http = Net::HTTP.new("google.com", 80) headers = { 'User-Agent' => 'Ruby 193'} resp, data = http.post("/", "name1=value1&name2=value2", headers) puts data And for open-uri which I can't get to do POST I use: data = open("http://google.com/","User-Agent"=> "Ruby 193").read How would I modify these to use a proxy with HTTP Authentication I've tried (for open-uri) data = open(

Display HTTP headers using Open::URI?

懵懂的女人 提交于 2019-12-05 03:52:28
with Open::URI, I can do the following: require 'open-uri' #check status open('http://google.com').status #get entire html open('http://google.com').read Is it possible to get the HTTP headers of a request so things can be debugged, something like Curls' curl -I http://google.com ? $ curl -I google.com HTTP/1.1 301 Moved Permanently Location: http://www.google.com/ Content-Type: text/html; charset=UTF-8 Date: Mon, 17 Dec 2012 14:28:17 GMT Expires: Wed, 16 Jan 2013 14:28:17 GMT Cache-Control: public, max-age=2592000 Server: gws Content-Length: 219 X-XSS-Protection: 1; mode=block X-Frame-Options

HTML is read before fully loaded using open-uri and nokogiri

笑着哭i 提交于 2019-12-04 04:30:13
I'm using open-uri and nokogiri with ruby to do some simple webcrawling. There's one problem that sometimes html is read before it is fully loaded. In such cases, I cannot fetch any content other than the loading-icon and the nav bar. What is the best way to tell open-uri or nokogiri to wait until the page is fully loaded? Currently my script looks like: require 'nokogiri' require 'open-uri' url = "https://www.the-page-i-wanna-crawl.com" doc = Nokogiri::HTML(open(url, ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE)) puts doc.at_css("h2").text What you describe is not possible. The result of open