问题
I want to allow crawlers to access my domain's root directory (i.e. the index.html file), but nothing deeper (i.e. no subdirectories). I do not want to have to list and deny every subdirectory individually within the robots.txt file. Currently I have the following, but I think it is blocking everything, including stuff in the domain's root.
User-agent: *
Allow: /$
Disallow: /
How can I write my robots.txt to accomplish what I am trying for?
Thanks in advance!
回答1:
There's nothing that will work for all crawlers. There are two options that might be useful to you.
Robots that allow wildcards should support something like:
Disallow: /*/
The major search engine crawlers understand the wildcards, but unfortunately most of the smaller ones don't.
If you have relatively few files in the root and you don't often add new files, you could use Allow
to allow access to just those files, and then use Disallow: /
to restrict everything else. That is:
User-agent: *
Allow: /index.html
Allow: /coolstuff.jpg
Allow: /morecoolstuff.html
Disallow: /
The order here is important. Crawlers are supposed to take the first match. So if your first rule was Disallow: /
, a properly behaving crawler wouldn't get to the following Allow
lines.
If a crawler doesn't support Allow
, then it's going to see the Disallow: /
and not crawl anything on your site. Providing, of course, that it ignores things in robots.txt that it doesn't understand.
All the major search engine crawlers support Allow
, and a lot of the smaller ones do, too. It's easy to implement.
回答2:
In short no there is no way to do this nicely using the robots.txt standard. Remember the Disallow specifies a path prefix. Wildcards and allows are non-standard.
So the following approach (a kludge!) will work.
User-agent: *
Disallow: /a
Disallow: /b
Disallow: /c
...
Disallow: /z
Disallow: /A
Disallow: /B
Disallow: /C
...
Disallow: /Z
Disallow: /0
Disallow: /1
Disallow: /2
...
Disallow: /9
来源:https://stackoverflow.com/questions/5206602/robots-txt-how-to-allow-access-only-to-domain-root-and-no-deeper