What does the dollar sign mean in robots.txt

前端 未结 1 1510
無奈伤痛
無奈伤痛 2021-01-18 12:26

I am curious about a website and want to do some web crawling at the /s path. Its robots.txt:

User-Agen         


        
相关标签:
1条回答
  • If you follow the original robots.txt specification, $ has no special meaning, and there is no Allow field defined. A conforming bot would have to ignore fields it does not know, therefore such a bot would actually see this record:

    User-Agent: *
    Disallow: /
    

    However, the original robots.txt specification has been extended by various parties. But as the authors of the robots.txt in question did not target a specific bot, we don’t know which "extension" they had in mind.

    Typically (but not necessarily, as it’s not formally specified), Allow overwrites rules specified in Disallow, and $ represents the end of the URL path.

    Following this interpretation (it’s, for example, used by Google), Allow: /$ would mean: You may crawl /, but you may not crawl /a, /b and so on.

    So crawling of URLs whose path starts with /s would not be allowed (neither according to the original spec, thanks to Disallow: /, nor according to Google’s extension).

    0 讨论(0)
提交回复
热议问题