How to get a remote-file's mtime before downloading it in Ruby?

后端 未结 3 466
名媛妹妹
名媛妹妹 2021-02-06 14:57

I have the below code, which simply downloads a file and saves it. I want to run it every 30 seconds and check if the remote-file\'s mtime has changed and download it if it has.

相关标签:
3条回答
  • 2021-02-06 15:17

    You can try to send the If-Modified-Since header with a correctly formatted date.

    If the server supports it, it can answer just with a 304 Not Modified status (without any content) or the full content if the file has been modified.

    0 讨论(0)
  • 2021-02-06 15:23

    The official Net::HTTP 2.6.5 docs have a concrete example of If-Modified-Since which was mentioned by https://stackoverflow.com/a/1509202/895245

    uri = URI('http://example.com/cached_response')
    file = File.stat 'cached_response'
    
    req = Net::HTTP::Get.new(uri)
    req['If-Modified-Since'] = file.mtime.rfc2822
    
    res = Net::HTTP.start(uri.hostname, uri.port) {|http|
      http.request(req)
    }
    
    open 'cached_response', 'w' do |io|
      io.write res.body
    end if res.is_a?(Net::HTTPSuccess)
    

    Here is a full script that actually runs:

    #!/usr/bin/env ruby
    
    require 'net/http'
    require 'time'
    
    uri = URI('https://upload.wikimedia.org/wikipedia/commons/thumb/9/95/Illumina_iSeq_100_flow_cell_top.jpg/451px-Illumina_iSeq_100_flow_cell_top.jpg')
    file_path = 'cached_response'
    req = Net::HTTP::Get.new(uri)
    if File.file?(file_path)
      req['If-Modified-Since'] = File.stat(file_path).mtime.rfc2822
    end
    res = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) {|http|
      http.request(req)
    }
    if res.is_a? Net::HTTPSuccess
      File.open(file_path, 'w') {|io|
        io.write res.body
      }
    end
    

    but TODO it is updating the file every time, even though Wikimedia seems to interpret If-Modified-Since: https://wikitech.wikimedia.org/wiki/MediaWiki_caching

    0 讨论(0)
  • 2021-02-06 15:35

    Before you do your http.get do an http.head which requests just the headers without downloading the body (i.e. the file contents) then check if the value of the Last Modified header has changed.

    e.g.

    resp = http.head(($xmlServerPath+"levels.xml")
    last_modified = resp['last-modified']
    if last_modified != previous_last_modified
      # file has changed
    end
    
    0 讨论(0)
提交回复
热议问题