Getting ETags right

前端 未结 4 1696
隐瞒了意图╮
隐瞒了意图╮ 2020-12-12 22:00

I’ve been reading a book and I have a particular question about the ETag chapter. The author says that ETags might harm performance and that you must tune them finely or d

相关标签:
4条回答
  • 2020-12-12 22:33

    ETag is similar to the Last-Modified header. It's a mechanism to determine change by the client.

    Arguably, an ETag that JUST HAPPENS to be the Last Modified date (i.e. the same text) meets all the criteria necessary for an ETag. It simply needs to be a unique value representing the state of a resource. Not unique across the entire domain of resources, simply within the resource.

    Now, technically, an ETag has "infinite" resolution compared to a Last-Modified header. Last-Modified only changes at a granularity of 1 second, whereas an ETag can be sub second.

    You can implement both ETag and Last-Modified, or simply one or the other (or none, of course). If you Last-Modified is not sufficient, then consider an ETag.

    Mind, I would not set ETag for "every" resource. Basically, I wouldn't set it for anything that has no expectation of being cached (dynamic content notably). There's no point in that case, just wasted work.

    Edit: I see your edit, and clarify.

    MD5 is fine. The only downside is calculating MD5 all the time. Running MD5 on, say, a 200K PDF file, is expensive. Running MD5 on a resource that has no expectation of being cached is simply wasteful (i.e. dynamic content).

    The trick is simply that whatever mechanism you use, it should be as cheap as Last-Modified typically is. Last-Modified is, again, typically, a property of the resource, and usually very cheap to access.

    ETags should be similarly cheap. If you are using MD5, and you can cache/store the association between the resource and the MD5 hash, then that's a fine solution. However, recalculating the MD5 each time the ETag is necessary, is basically counter to the idea of using ETags to improve overall server performance.

    0 讨论(0)
  • 2020-12-12 22:33

    Having not read the book, I can't speak on the author's precise concerns.

    However, the generation of ETags should be such that an ETag is only generated once when a page has changed. Generating an MD5 hash of a web page costs processing power and time on the server; if you have many clients connecting, it could start to cause performance problems.

    Thus, you need a good technique for generating ETags only when necessary and caching them on the server until the related page changes.

    0 讨论(0)
  • 2020-12-12 22:42

    We're using etags for our dynamic content in instela.

    Our strategy is at the end of output generating the md5 hash of the content to send and if the if-none-match header exists, we compare the header with the generated hash. If the two values are the same we send 304 code and interrumpt the request without returning any content.

    It's true that we consume a bit cpu to hash the content but finally we're saving much bandwidth.

    We have a facebook newsfeed style main page which has different content for every user. As the newsfeed content changes only 3-4 time per hour, the main page refreshes are so efficient for the client side. In the mobile era I think it's better to spend a bit more cpu time than spending bandwidth. Bandwidth is still more expensive than the CPU, and it's a better experience for the client.

    0 讨论(0)
  • 2020-12-12 22:48

    I think the perceived problem with ETAGS is probably that your browser has to issue and parse a (simple and small) request / response for every resource on your page to check if the etag value has changed server side.

    I personally find these extra small roundtrips to the server acceptable for often changing images, css, javascript (the server does not need to resend the content if the browser's etag is current) since the mechanism makes it quite easy to mark 'updated' content.

    0 讨论(0)
提交回复
热议问题