问题
I have a server that serves a large amount of static content. The CherryPy tool tools.gzip is enabled to compress the files whenever gzip content is supported.
Question: Is CherryPy gzipping the static files every time they are requested, or does it gzip the content once and serve that gzipped copy to all requests?
If CherryPy is currently gzipping the files every time they are requested, would enabling tools.caching prevent that, or is there a better way?
回答1:
First, I would like to note that despite the seeming ease of HTTP caused by its enormous wide spread and presence of good client libraries for each language, HTTP is in fact a complex protocol which involves multiple interacting tiers. Caching is no exception, RFC 2616 Section 13. The following is said about Last-Modified
/If-Modified-Since
, because ETag
ing with gzip is another story.
Setup
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import cherrypy
path = os.path.abspath(os.path.dirname(__file__))
config = {
'global' : {
'server.socket_host' : '127.0.0.1',
'server.socket_port' : 8080,
'server.thread_pool' : 8
},
'/static' : {
'tools.gzip.on' : True,
'tools.staticdir.on' : True,
'tools.staticdir.dir' : os.path.join(path, 'static')
}
}
if __name__ == '__main__':
cherrypy.quickstart(config = config)
Then put some plain text or HTML file in static
directory.
Experiment
Firefox and Chromium don't send cache related headers on first request, i.e. GET /static/some.html:
Accept-Encoding: gzip, deflate
Host: 127.0.0.1:8080
Response:
Accept-Ranges: bytes
Content-Encoding: gzip
Content-Length: 50950
Content-Type: text/html
Date: Mon, 15 Dec 2014 12:32:40 GMT
Last-Modified: Wed, 22 Jan 2014 09:22:27 GMT
Server: CherryPy/3.6.0
Vary: Accept-Encoding
On subsequent requests, any networking is avoided with the following cache info (Firebug):
Data Size: 50950
Device: disk
Expires: Sat Jan 17 2015 05:39:41 GMT
Fetch Count: 6
Last Fetched: Mon Dec 15 2014 13:19:45 GMT
Last Modified: Mon Dec 15 2014 13:19:44 GMT
Because by default CherryPy doesn't provides expiration time (Expires
or Cache-Control
), Firefox (likely Chromium too) uses the heuristic according to RFC 2616 Section 13.2.4:
If none of Expires, Cache-Control: max-age, or Cache-Control: s- maxage (see section 14.9.3) appears in the response, and the response does not include other restrictions on caching, the cache MAY compute a freshness lifetime using a heuristic...
Also, if the response does have a Last-Modified time, the heuristic expiration value SHOULD be no more than some fraction of the interval since that time. A typical setting of this fraction might be 10%.
Here's the code to prove heuristic nature of the Expires
value:
import email.utils
import datetime
s = 'Wed, 22 Jan 2014 09:22:27 GMT'
lm = datetime.datetime(*email.utils.parsedate(s)[0:6])
print datetime.datetime.utcnow() + (datetime.datetime.utcnow() - lm) / 10
When you refresh the page a browser appends Cache-Control
to the request:
Accept-Encoding: gzip,deflate
Cache-Control: max-age=0
Host: 127.0.0.1:8080
If-Modified-Since: Wed, 22 Jan 2014 09:22:27 GMT
CherryPy replies 304 Not Modified if the file hasn't been changed. Here's how it works:
cherrypy.lib.static.serve_file
def serve_file(path, content_type=None, disposition=None, name=None, debug=False):
# ...
try:
st = os.stat(path)
except OSError:
if debug:
cherrypy.log('os.stat(%r) failed' % path, 'TOOLS.STATIC')
raise cherrypy.NotFound()
# ...
# Set the Last-Modified response header, so that
# modified-since validation code can work.
response.headers['Last-Modified'] = httputil.HTTPDate(st.st_mtime)
cptools.validate_since()
# ...
cherrypy.lib.cptools.validate_since
def validate_since():
"""Validate the current Last-Modified against If-Modified-Since headers.
If no code has set the Last-Modified response header, then no validation
will be performed.
"""
response = cherrypy.serving.response
lastmod = response.headers.get('Last-Modified')
if lastmod:
# ...
since = request.headers.get('If-Modified-Since')
if since and since == lastmod:
if (status >= 200 and status <= 299) or status == 304:
if request.method in ("GET", "HEAD"):
raise cherrypy.HTTPRedirect([], 304)
else:
raise cherrypy.HTTPError(412)
Wrap up
Using tools.staticdir
CherryPy doesn't send file contents, neither gzips them, for requests that come with valid If-Modified-Since
header, but only responding with 304 Not Modified asking filesystem for modification time. Without page refreshing it won't even receive a request because browsers use the heuristic for expiration time when the server doesn't provide one. Of course making your configuration more deterministic, providing cache time-to-live won't hurt, like:
'/static' : {
'tools.gzip.on' : True,
'tools.staticdir.on' : True,
'tools.staticdir.dir' : os.path.join(path, 'static'),
'tools.expires.on' : True,
'tools.expires.secs' : 3600 # expire in an hour
}
来源:https://stackoverflow.com/questions/21961073/cherrypy-caching-of-static-files