CherryPy - Caching of static files

最后都变了- 提交于 2019-12-13 02:12:51

问题


I have a server that serves a large amount of static content. The CherryPy tool tools.gzip is enabled to compress the files whenever gzip content is supported.

Question: Is CherryPy gzipping the static files every time they are requested, or does it gzip the content once and serve that gzipped copy to all requests?

If CherryPy is currently gzipping the files every time they are requested, would enabling tools.caching prevent that, or is there a better way?


回答1:


First, I would like to note that despite the seeming ease of HTTP caused by its enormous wide spread and presence of good client libraries for each language, HTTP is in fact a complex protocol which involves multiple interacting tiers. Caching is no exception, RFC 2616 Section 13. The following is said about Last-Modified/If-Modified-Since, because ETaging with gzip is another story.

Setup

#!/usr/bin/env python
# -*- coding: utf-8 -*-


import os

import cherrypy


path   = os.path.abspath(os.path.dirname(__file__))
config = {
  'global' : {
    'server.socket_host' : '127.0.0.1',
    'server.socket_port' : 8080,
    'server.thread_pool' : 8
  },
  '/static' : {
    'tools.gzip.on'       : True,
    'tools.staticdir.on'  : True,
    'tools.staticdir.dir' : os.path.join(path, 'static')
  }
}


if __name__ == '__main__':
  cherrypy.quickstart(config = config)

Then put some plain text or HTML file in static directory.

Experiment

Firefox and Chromium don't send cache related headers on first request, i.e. GET /static/some.html:

Accept-Encoding: gzip, deflate
Host: 127.0.0.1:8080

Response:

Accept-Ranges: bytes
Content-Encoding: gzip
Content-Length: 50950
Content-Type: text/html
Date: Mon, 15 Dec 2014 12:32:40 GMT
Last-Modified: Wed, 22 Jan 2014 09:22:27 GMT
Server: CherryPy/3.6.0
Vary: Accept-Encoding

On subsequent requests, any networking is avoided with the following cache info (Firebug):

Data Size: 50950
Device: disk
Expires: Sat Jan 17 2015 05:39:41 GMT
Fetch Count: 6
Last Fetched: Mon Dec 15 2014 13:19:45 GMT
Last Modified: Mon Dec 15 2014 13:19:44 GMT

Because by default CherryPy doesn't provides expiration time (Expires or Cache-Control), Firefox (likely Chromium too) uses the heuristic according to RFC 2616 Section 13.2.4:

If none of Expires, Cache-Control: max-age, or Cache-Control: s- maxage (see section 14.9.3) appears in the response, and the response does not include other restrictions on caching, the cache MAY compute a freshness lifetime using a heuristic...

Also, if the response does have a Last-Modified time, the heuristic expiration value SHOULD be no more than some fraction of the interval since that time. A typical setting of this fraction might be 10%.

Here's the code to prove heuristic nature of the Expires value:

import email.utils
import datetime

s  = 'Wed, 22 Jan 2014 09:22:27 GMT'
lm = datetime.datetime(*email.utils.parsedate(s)[0:6])

print datetime.datetime.utcnow() + (datetime.datetime.utcnow() - lm) / 10

When you refresh the page a browser appends Cache-Control to the request:

Accept-Encoding: gzip,deflate
Cache-Control: max-age=0
Host: 127.0.0.1:8080
If-Modified-Since: Wed, 22 Jan 2014 09:22:27 GMT

CherryPy replies 304 Not Modified if the file hasn't been changed. Here's how it works:

cherrypy.lib.static.serve_file

def serve_file(path, content_type=None, disposition=None, name=None, debug=False):
    # ...

    try:
        st = os.stat(path)
    except OSError:
        if debug:
            cherrypy.log('os.stat(%r) failed' % path, 'TOOLS.STATIC')
        raise cherrypy.NotFound()

    # ...

    # Set the Last-Modified response header, so that
    # modified-since validation code can work.
    response.headers['Last-Modified'] = httputil.HTTPDate(st.st_mtime)
    cptools.validate_since()

    # ...

cherrypy.lib.cptools.validate_since

def validate_since():
    """Validate the current Last-Modified against If-Modified-Since headers.

    If no code has set the Last-Modified response header, then no validation
    will be performed.
    """
    response = cherrypy.serving.response
    lastmod = response.headers.get('Last-Modified')
    if lastmod:  
        # ...

        since = request.headers.get('If-Modified-Since')
        if since and since == lastmod:
            if (status >= 200 and status <= 299) or status == 304:
                if request.method in ("GET", "HEAD"):
                    raise cherrypy.HTTPRedirect([], 304)
                else:
                    raise cherrypy.HTTPError(412)

Wrap up

Using tools.staticdir CherryPy doesn't send file contents, neither gzips them, for requests that come with valid If-Modified-Since header, but only responding with 304 Not Modified asking filesystem for modification time. Without page refreshing it won't even receive a request because browsers use the heuristic for expiration time when the server doesn't provide one. Of course making your configuration more deterministic, providing cache time-to-live won't hurt, like:

'/static' : {
  'tools.gzip.on'       : True,
  'tools.staticdir.on'  : True,
  'tools.staticdir.dir' : os.path.join(path, 'static'),
  'tools.expires.on'    : True,
  'tools.expires.secs'  : 3600 # expire in an hour
}


来源:https://stackoverflow.com/questions/21961073/cherrypy-caching-of-static-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!