Robots.txt: allow only major SE

前端 未结 4 939
不思量自难忘°
不思量自难忘° 2020-12-31 01:11

Is there a way to configure the robots.txt so that the site accepts visits ONLY from Google, Yahoo! and MSN spiders?

相关标签:
4条回答
  • 2020-12-31 01:55

    Why?

    Anyone doing evil (e.g., gathering email addresses to spam) will just ignore robots.txt. So you're only going to be blocking legitimate search engines, as robots.txt compliance is voluntary.

    But — if you insist on doing it anyway — that's what the User-Agent: line in robots.txt is for.

    User-agent: googlebot
    Disallow: 
    
    User-agent: *
    Disallow: /
    

    With lines for all the other search engines you'd like traffic from, of course. Robotstxt.org has a partial list.

    0 讨论(0)
  • 2020-12-31 01:59

    As everyone know, the robots.txt is a standard to be obeyed by the crawler and hence only well-behaved agents do so. So, putting it or not doesn't matter.

    If you have some data, that you do not show on the site as well, you can just change the permission and improve the security.

    0 讨论(0)
  • 2020-12-31 02:05

    User-agent: *
    Disallow: /
    User-agent: Googlebot
    Allow: /
    User-agent: Slurp
    Allow: /
    User-Agent: msnbot
    Disallow: 
    

    Slurp is Yahoo's robot

    0 讨论(0)
  • 2020-12-31 02:05

    There are more than 3 major search engines depending on which country you are talking. Facebook seem to be doing a good job listing only legitimate ones: https://facebook.com/robots.txt

    So your robots.txt can be something like:

    User-agent: Applebot
    Allow: /
    
    User-agent: baiduspider
    Allow: /
    
    User-agent: Bingbot
    Allow: /
    
    User-agent: Facebot
    Allow: /
    
    User-agent: Googlebot
    Allow: /
    
    User-agent: msnbot
    Allow: /
    
    User-agent: Naverbot
    Allow: /
    
    User-agent: seznambot
    Allow: /
    
    User-agent: Slurp
    Allow: /
    
    User-agent: teoma
    Allow: /
    
    User-agent: Twitterbot
    Allow: /
    
    User-agent: Yandex
    Allow: /
    
    User-agent: Yeti
    Allow: /
    
    User-agent: *
    Disallow: /
    
    0 讨论(0)
提交回复
热议问题