问题
I have a site with the following structure:
http://www.example.com/folder1/folder2/folder3
I would like to disallow indexing in folder1
, and folder2
.
But I would like the robots to index everything under folder3
.
Is there a way to do this with the robots.txt?
For what I read I think that everything inside a specified folder is disallowed.
Would the following achieve my goal?
user-agent: *
Crawl-delay: 0
Sitemap: <Sitemap url>
Allow: /folder1/folder2/folder3
Disallow: /folder1/folder2/
Disallow: /folder1/
Allow: /
回答1:
Yes, it works... however google has a tool to test your robots.txt file
you only need to go on google webmaster tools (https://www.google.com/webmasters/tools/)
and open the section "site configuration -> crawler access"
回答2:
All you would need is:
user-agent: *
Crawl-delay: 0
Sitemap:
Allow: /folder1/folder2/folder3
Disallow: /folder1/
Allow: /
At least googlebot will see the more specific allowing of that one directory and disallow anything from folder1
and on. This is backed up by this post by a Google employee.
回答3:
Line breaks in records are not allowed, so your original robots.txt should look like this:
user-agent: *
Crawl-delay: 0
Sitemap: <Sitemap url>
Allow: /folder1/folder2/folder3
Disallow: /folder1/folder2/
Disallow: /folder1/
Allow: /
Possible improvements:
Specifying
Allow: /
is superfluous, as it’s the default anyway.Specifying
Disallow: /folder1/folder2/
is superfluous, asDisallow: /folder1/
is sufficient.As
Sitemap
is not per record, but for all bots, you could specify it as a separate block.
So your robots.txt could look like this:
User-agent: *
Crawl-delay: 0
Allow: /folder1/folder2/folder3
Disallow: /folder1/
Sitemap: http://example.com/sitemap
(Note that the Allow
field is not part of the original robots.txt specification, so don’t expect all bots to understand it.)
来源:https://stackoverflow.com/questions/5998434/blocking-folders-inbetween-allowed-content