发表新帖

发表新帖

robots.txt allow root only, disallow everything else?

后端未结

关注

 2  607

清歌不尽 2021-02-01 12:47

I can\'t seem to get this to work but it seems really basic.

I want the domain root to be crawled

http://www.example.com

But nothing el

2条回答

小鲜肉 (楼主)

2021-02-01 13:52
According to the Backus-Naur Form (BNF) parsing definitions in Google's robots.txt documentation, the order of the Allow and Disallow directives doesn't matter. So changing the order really won't help you.

Instead, use the $ operator to indicate the closing of your path. $ means 'the end of the line' (i.e. don't match anything from this point on)

Test this robots.txt. I'm certain it should work for you (I've also verified in Google Search Console):
```
user-agent: *
Allow: /$
Disallow: /
```
This will allow http://www.example.com and http://www.example.com/ to be crawled but everything else blocked.

note: that the Allow directive satisfies your particular use case, but if you have index.html or default.php, these URLs will not be crawled.

side note: I'm only really familiar with Googlebot and bingbot behaviors. If there are any other engines you are targeting, they may or may not have specific rules on how the directives are listed out. So if you want to be "extra" sure, you can always swap the positions of the Allow and Disallow directive blocks, I just set them that way to debunk some of the comments.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题