web-crawler

AttributeError while scraping

≯℡__Kan透↙ 提交于 2020-12-13 03:35:23
问题 I am trying to scrape a website, I have got this error: AttributeError: 'NoneType' object has no attribute 'text' at ---> 12 for x in soup.select("div.site-content")] The code used is: rq = req.get("https://stopcensura.net/category/cronaca") soup = BeautifulSoup(rq.content, 'html.parser') scrape_info = [(x.h3.a.text, x.time.text) for x in soup.select("div.site-content")] I would like to get infnormation on title ( entry-title ), date ( class="date" ), the author ( <div class="by-author vcard

How to check if content of webpage has been changed?

假如想象 提交于 2020-11-26 06:48:09
问题 Basically I'm trying to run some code (Python 2.7) if the content on a website changes, otherwise wait for a bit and check it later. I'm thinking of comparing hashes , the problem with this is that if the page has changed a single byte or character, the hash would be different. So for example if the page display the current date on the page, every single time the hash would be different and tell me that the content has been updated. So... How would you do this? Would you look at the Kb size