Robots.txt Allow sub folder but not the parent

对着背影说爱祢 提交于 2019-12-30 05:38:08

问题


Can anybody please explain the correct robots.txt command for the following scenario.

I would like to allow access to:

/directory/subdirectory/..

But I would also like to restrict access to /directory/ not withstanding the above exception.


回答1:


Be aware that there is no real official standard and that any web crawler may happily ignore your robots.txt

According to a Google groups post, the following works at least with GoogleBot;

User-agent: Googlebot 
Disallow: /directory/ 
Allow: /directory/subdirectory/



回答2:


If these are truly directories then the accepted answer is probably your best choice. But, if you're writing an application and the directories are dynamically generated paths (a.k.a. contexts, routes, etc), then you might want to use meta tags instead of defining it in the robots.txt. This gives you the advantage of not having to worry about how different browsers may interpret/prioritize the access to the subdirectory path.

You might try something like this in the code:

if is_parent_directory_path
   <meta name="robots" content="noindex, nofollow">
end



回答3:


I would recommend using Google's robot tester. Utilize Google Webmaster tools - https://support.google.com/webmasters/answer/6062598?hl=en

You can edit and test URLs right in the tool, plus you get a wealth of other tools as well.



来源:https://stackoverflow.com/questions/7609031/robots-txt-allow-sub-folder-but-not-the-parent

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!