Robots.txt, how to allow access only to domain root, and no deeper? [closed]

怎甘沉沦 提交于 2019-12-21 07:16:14

问题


I want to allow crawlers to access my domain's root directory (i.e. the index.html file), but nothing deeper (i.e. no subdirectories). I do not want to have to list and deny every subdirectory individually within the robots.txt file. Currently I have the following, but I think it is blocking everything, including stuff in the domain's root.

User-agent: *
Allow: /$
Disallow: /

How can I write my robots.txt to accomplish what I am trying for?

Thanks in advance!


回答1:


There's nothing that will work for all crawlers. There are two options that might be useful to you.

Robots that allow wildcards should support something like:

Disallow: /*/

The major search engine crawlers understand the wildcards, but unfortunately most of the smaller ones don't.

If you have relatively few files in the root and you don't often add new files, you could use Allow to allow access to just those files, and then use Disallow: / to restrict everything else. That is:

User-agent: *
Allow: /index.html
Allow: /coolstuff.jpg
Allow: /morecoolstuff.html
Disallow: /

The order here is important. Crawlers are supposed to take the first match. So if your first rule was Disallow: /, a properly behaving crawler wouldn't get to the following Allow lines.

If a crawler doesn't support Allow, then it's going to see the Disallow: / and not crawl anything on your site. Providing, of course, that it ignores things in robots.txt that it doesn't understand.

All the major search engine crawlers support Allow, and a lot of the smaller ones do, too. It's easy to implement.




回答2:


In short no there is no way to do this nicely using the robots.txt standard. Remember the Disallow specifies a path prefix. Wildcards and allows are non-standard.

So the following approach (a kludge!) will work.

User-agent: *
Disallow: /a
Disallow: /b
Disallow: /c
...
Disallow: /z
Disallow: /A
Disallow: /B
Disallow: /C
...
Disallow: /Z
Disallow: /0
Disallow: /1
Disallow: /2
...
Disallow: /9


来源:https://stackoverflow.com/questions/5206602/robots-txt-how-to-allow-access-only-to-domain-root-and-no-deeper

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!