How to check if content of webpage has been changed?

前端 未结 6 1063
失恋的感觉
失恋的感觉 2020-12-30 11:23

Basically I\'m trying to run some code (Python 2.7) if the content on a website changes, otherwise wait for a bit and check it later.

I\'m thinking of comparing

6条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-30 11:56

    Use git, which has excellent reporting capabilities on what has changed between two states of a file; plus you won't eat up disk space as git manages the deltas for you.

    You can even tell git to ignore "trivial" changes, such as adding and removing of whitespace characters to further optimize the search.

    Practically what this comes down to is parsing the output of git diff -b --numstat HEAD HEAD^; which roughly translates to "find me what has changed in all the files, ignoring any whitespace changes, between the current state, and the previous state"; which will result in output like this:

    2       37      en/index.html
    

    2 insertions were made, 37 deletions were made to en/index.html

    Next you'll have to do some experimentation to find a "threshold" at which you would consider a change significant in order to process the files further; this will take time as you will have to train the system (you can also automate this part, but that is another topic all together).

    Unless you have a very good reason to do so - don't use your traditional, relational database as a file system. Let the operating system take care of files, which its very good at (something a relational database is not designed to manage).

提交回复
热议问题