How ETags are generated and configured?

前端 未结 2 2055
無奈伤痛
無奈伤痛 2020-12-22 05:58

I recently came through the concept of ETag HTTP header. (this) But I still have a problem that for a particular HTTP resource who is responsible to generate ET

相关标签:
2条回答
  • 2020-12-22 06:24

    As with most aspects of the HTTP specification, the responsibility ultimately lies with whoever is providing the resource.

    Of course, it's often that case that we use tools—servers, load balancers, application frameworks, etc.—that help us fulfill those responsibilities. But there isn't any specification defining what a "web server", as opposed to the application, is expected to provide, it's just a practical question of what features are available in the tools you're using.

    Now, looking at ETags in particular, a common situation is that the framework or web server can be configured to automatically hash the response (either the body or something else) and put the result in the ETag. Then, on a conditional request, it will generate a response and hash it to see if it has changed, and automatically send the conditional response if it hasn't.

    To take two examples that I'm familiar with, nginx can do this with static files at web server level, and Django can do this with dynamic responses at the application level.

    That approach is common, easy to configure, and works pretty well. In some situations, though, it might not be the best fit for your use case. For example:

    • To compute a hash to compare to the incoming ETag you first have to have a response. So although the conditional response can save you the overhead of transmitting the response, it can't save you the cost of generating the response. So if generating your response is expensive, and you have an alternative source of ETags (for example, version numbers stored in the database), you can use that to achieve better performance.
    • If you're planning to use the ETags to prevent accidental overwrites with state-changing methods, you will probably need to add your own application code to make your compare-and-set logic atomic.

    So in some situations you might want to create your ETags at the application level. To take Django as an example again, it provides an easy way for you to provide your own function to compute ETags.

    In sum, it's ultimately your responsibility to provide the ETags for the resources you control, but you may well be able to take advantage of the tools in your software stack to do it for you.

    0 讨论(0)
  • 2020-12-22 06:33

    Overview of typical algorithms used in webservers. Consider we have a file with

    • Size 1047 i.e. 417 in hex.
    • MTime i.e. last modification on Mon, 06 Jan 2020 12:54:56 GMT which is 1578315296 seconds in unix time or 1578315296666771000 nanoseconds.
    • Inode which is a physical file number 66 i.e. 42 in hex

    Different webservers returns ETag like:

    • Nginx: "5e132e20-417" i.e. "hex(MTime)-hex(Size)". Not configurable.
    • BusyBox httpd the same as Nginx
    • Apache/2.2: "42-417-59b782a99f493" i.e. "hex(INode)-hex(Size)-hex(MTime in nanoseconds)". Can be configured but MTime anyway will be in nanos
    • Apache/2.4: "417-59b782a99f493" i.e. "hex(Size)-hex(MTime in nanoseconds)" i.e. without INode which is friendly for load balancing when identical file have different INode on different servers.
    • OpenWrt uhttpd: "42-417-5e132e20" i.e. "hex(INode)-hex(Size)-hex(MTime)". Not configurable.
    • Tomcat 9: W/"1047-1578315296666" i.e. Weak"Size-MTime in milliseconds". This is incorrect ETag because it should be strong as for a static file i.e. octal compatibility.
    • LightHTTPD: most weird: "hashcode(42-1047-1578315296666771000)" i.e. INode-Size-MTime but then reduced to a simple integer by hashcode. Can be configured but you can only disable one part (etag.use-inode = "disabled")
    • MS IIS: it have a form Filetimestamp:ChangeNumber e.g. "53dbd5819f62d61:0". Not documented, not configurable but can be disabled.
    • Jetty: based on last mod, size and hashed. See Resource.getWeakETag()
    • Kitura (Swift): "W/hex(Size)-hex(MTime)" StaticFileServer.calculateETag

    Few thoughts:

    • Hex numbers are used here so often because it's cheap to convert a decimal number to a shorter hex string.
    • Inode while adding more guarantees makes load balancing not possible and very fragile if you simply copied the file during application redeploy. MTime in nanoseconds is not available on all platforms and such granularity not needed.
    • Apache have a bug about this like https://bz.apache.org/bugzilla/show_bug.cgi?id=55573
    • The order MTime-Size or Size-MTime is also matters because MTime is more likely changed so comparing ETag string may be faster for a dozen CPU cycles.
    • Even if this is not a full checksum hash but definitely not a weak ETag. This is enough to show that we expect octal compatibility for Range requests.
    • Apache and Nginx shares almost all trafic in Internet but most static files are shared via Nginx and it is not configurable.

    It looks like Nginx uses the most reasonable schema so if you implementing try to make it the same. The whole ETag generated in C with one line:

    printf("\"%" PRIx64 "-%" PRIx64 "\"", last_mod, file_size)
    

    My proposition is to take Nginx schema and make it as a recommended ETag algorithm by W3C.

    0 讨论(0)
提交回复
热议问题