Retrieve contents of URL as string

前端 未结 7 645
温柔的废话
温柔的废话 2020-12-25 12:57

For tedious reasons to do with Hpricot, I need to write a function that is passed a URL, and returns the whole contents of the page as a single string.

I\'m close.

相关标签:
7条回答
  • 2020-12-25 13:27

    The open method passes an IO representation of the resource to your block when it yields. You can read from it using the IO#read method

    open([mode [, perm]] [, options]) [{|io| ... }] 
    open(path) { |io| data = io.read }
    
    0 讨论(0)
  • 2020-12-25 13:31

    Try the following instead:

    require 'open-uri' 
    content = URI(your_url).read
    
    0 讨论(0)
  • 2020-12-25 13:33
    
    require 'open-uri'
    open(url) {|f|  #url must specify the protocol
    str = f.read()
    }
    
    0 讨论(0)
  • 2020-12-25 13:35

    I was also very confused what to use for better performance and speedy results. I ran a benchmark for both to make it more clear:

    require 'benchmark'
    require 'net/http'
    require "uri"
    require 'open-uri'
    
    url = "http://www.google.com"
    Benchmark.bm do |x|
      x.report("net-http:")   { content = Net::HTTP.get_response(URI.parse(url)).body if url }
      x.report("open-uri:")   { open(url){|f| content =  f.read } if url }
    end
    

    Its result is:

                  user     system      total        real
    net-http:  0.000000   0.000000   0.000000 (  0.097779)
    open-uri:  0.030000   0.010000   0.040000 (  0.864526)
    

    I'd like to say that it depends on what your requirement is and how you want to process.

    0 讨论(0)
  • 2020-12-25 13:40
    require 'open-uri'
    open(url) do |f|
      page_string = f.read
    end
    

    See also the documentation of IO class

    0 讨论(0)
  • 2020-12-25 13:42

    To make code a little clearer, the OpenURI open method will return the value returned by the block, so you can assign open's return value to your variable. For example:

    xml_text = open(url) { |io| io.read }
    
    0 讨论(0)
提交回复
热议问题