I recently came through the concept of ETag
HTTP header. (this) But I still have a problem that for a particular HTTP resource who is responsible to generate ET
As with most aspects of the HTTP specification, the responsibility ultimately lies with whoever is providing the resource.
Of course, it's often that case that we use tools—servers, load balancers, application frameworks, etc.—that help us fulfill those responsibilities. But there isn't any specification defining what a "web server", as opposed to the application, is expected to provide, it's just a practical question of what features are available in the tools you're using.
Now, looking at ETags
in particular, a common situation is that the framework or web server can be configured to automatically hash the response (either the body or something else) and put the result in the ETag
. Then, on a conditional request, it will generate a response and hash it to see if it has changed, and automatically send the conditional response if it hasn't.
To take two examples that I'm familiar with, nginx can do this with static files at web server level, and Django can do this with dynamic responses at the application level.
That approach is common, easy to configure, and works pretty well. In some situations, though, it might not be the best fit for your use case. For example:
ETag
you first have to have a response. So although the conditional response can save you the overhead of transmitting the response, it can't save you the cost of generating the response. So if generating your response is expensive, and you have an alternative source of ETags
(for example, version numbers stored in the database), you can use that to achieve better performance.ETags
to prevent accidental overwrites with state-changing methods, you will probably need to add your own application code to make your compare-and-set logic atomic. So in some situations you might want to create your ETags
at the application level. To take Django as an example again, it provides an easy way for you to provide your own function to compute ETags
.
In sum, it's ultimately your responsibility to provide the ETags
for the resources you control, but you may well be able to take advantage of the tools in your software stack to do it for you.
Overview of typical algorithms used in webservers. Consider we have a file with
Different webservers returns ETag like:
"5e132e20-417"
i.e. "hex(MTime)-hex(Size)"
. Not configurable."42-417-59b782a99f493"
i.e. "hex(INode)-hex(Size)-hex(MTime in nanoseconds)"
. Can be configured but MTime anyway will be in nanos"417-59b782a99f493"
i.e. "hex(Size)-hex(MTime in nanoseconds)"
i.e. without INode which is friendly for load balancing when identical file have different INode on different servers."42-417-5e132e20"
i.e. "hex(INode)-hex(Size)-hex(MTime)"
. Not configurable.W/"1047-1578315296666"
i.e. Weak"Size-MTime in milliseconds"
. This is incorrect ETag because it should be strong as for a static file i.e. octal compatibility."hashcode(42-1047-1578315296666771000)"
i.e. INode-Size-MTime
but then reduced to a simple integer by hashcode. Can be configured but you can only disable one part (etag.use-inode = "disabled"
)"W/hex(Size)-hex(MTime)"
StaticFileServer.calculateETagFew thoughts:
MTime
in nanoseconds is not available on all platforms and such granularity not needed.MTime-Size
or Size-MTime
is also matters because MTime
is more likely changed so comparing ETag string may be faster for a dozen CPU cycles.It looks like Nginx uses the most reasonable schema so if you implementing try to make it the same. The whole ETag generated in C with one line:
printf("\"%" PRIx64 "-%" PRIx64 "\"", last_mod, file_size)
My proposition is to take Nginx schema and make it as a recommended ETag algorithm by W3C.