block search engine crawling directory

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-20 06:36:22

问题


My site have URL (longest) structure like this

http://www.example.com/xyz-pqr/abcd-efgh/123.html

So there is maximum of 3 directory level, but because of CMS and other problem my site is getting indexed in search engine for more than 3 directory level url like,

http://www.example.com/xyz-pqr/abcd-efgh/xyz-pqr/abcd-efgh/123.html
http://www.example.com/xyz-pqr/abcd-efgh/xyz-pqr/abcd-efgh/abcd-efgh/123.html

I want to write code in robots.txt so that, search engine will never crawl more than 3 directory level. How do I do this? Thanks in advance...


回答1:


I'm not certain, but I think the following should work:

User-agent: *
Disallow: /*/*/*/

So, given these two URLs:

http://www.example.com/xyz-pqr/abcd-efgh/123.html
http://www.example.com/xyz-pqr/abcd-efgh/foo-bar/123.html

The first would be accepted because it has only two directory segments (/xyz-pqr-abcd-efgh).

The second would be blocked because it has three directory segments.

And anything longer would be blocked, as well.



来源:https://stackoverflow.com/questions/22341736/block-search-engine-crawling-directory

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!