How to prevent unauthorized spidering

后端未结

关注

 6  1135

刺人心 2021-02-06 04:59

I want to prevent automated html scraping from one of our sites while not affecting legitimate spidering (googlebot, etc.). Is there something that already exists to accomplish

6条回答

青春惊慌失措 (楼主)

2021-02-06 05:55

You should do what good firewalls do when they detect malicious use - let them keep going but don't give them anything else. If you start throwing 403 or 404 they'll know something is wrong. If you return random data they'll go about their business.

For detecting malicious use though, try adding a trap link on search results page (or the page they are using as your site map) and hide it with CSS. Need to check if they are claiming to be a valid bot and let them through though. You can store their IP for future use and a quick ARIN WHOIS search.

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...