Basically I\'m trying to run some code (Python 2.7) if the content on a website changes, otherwise wait for a bit and check it later.
I\'m thinking of comparing
download the content and create a hash checksum using SHA512
hash of content, keep it in the db and compare it each time.
Pros: You are not dependent to any Server headers and will detect any modifications.
Cons: Too much bandwidth usage. You have to download all the content every time.
Head
Request page using HEAD
verb and check the Header Tags:
Last-Modified
: Server should provide last time page generated or Modified. ETag
: A checksum-like value which is defined by server and should change as soon as content changed.Pros: Much less bandwidth usage and very quick update.
Cons: Not all servers provides and obey following guidelines. Need to get real resource using GET
request if you find data is need to fetch
GET
Request page using GET
verb and using conditional Header Tags:
* If-Modified-Since
: Server will check if resource modified since following time and return content or return 304 Not Modified
Pros: Still Using less bandwidth, Single trip to receive data.
Cons: Again not all resource support this header.
Finally, maybe mix of above solution is optimum way for doing such action.