How to prevent unauthorized spidering

后端未结

关注

 6  1140

刺人心 2021-02-06 04:59

I want to prevent automated html scraping from one of our sites while not affecting legitimate spidering (googlebot, etc.). Is there something that already exists to accomplish

6条回答

长情又很酷 (楼主)

2021-02-06 05:56

If you want to protect yourself from generic crawler, use a honeypot.

See, for example, http://www.sqlite.org/cvstrac/honeypot. The good spider will not open this page because site's robots.txt disallows it explicitly. Human may open it, but is not supposed to click "i am a spider" link. The bad spider will certainly follow both links and so will betray its true identity.

If the crawler is created specifically for your site, you can (in theory) create a moving honeypot.

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...