How to check if content of webpage has been changed?

前端 未结 6 1060
失恋的感觉
失恋的感觉 2020-12-30 11:23

Basically I\'m trying to run some code (Python 2.7) if the content on a website changes, otherwise wait for a bit and check it later.

I\'m thinking of comparing

6条回答
  •  时光说笑
    2020-12-30 11:49

    Safest solution:

    download the content and create a hash checksum using SHA512 hash of content, keep it in the db and compare it each time.

    Pros: You are not dependent to any Server headers and will detect any modifications.
    Cons: Too much bandwidth usage. You have to download all the content every time.

    Using Head

    Request page using HEAD verb and check the Header Tags:

    • Last-Modified: Server should provide last time page generated or Modified.
    • ETag: A checksum-like value which is defined by server and should change as soon as content changed.

    Pros: Much less bandwidth usage and very quick update.
    Cons: Not all servers provides and obey following guidelines. Need to get real resource using GET request if you find data is need to fetch

    Using GET

    Request page using GET verb and using conditional Header Tags: * If-Modified-Since: Server will check if resource modified since following time and return content or return 304 Not Modified

    Pros: Still Using less bandwidth, Single trip to receive data.
    Cons: Again not all resource support this header.

    Finally, maybe mix of above solution is optimum way for doing such action.

提交回复
热议问题