问题
My site have URL (longest) structure like this
http://www.example.com/xyz-pqr/abcd-efgh/123.html
So there is maximum of 3 directory level, but because of CMS and other problem my site is getting indexed in search engine for more than 3 directory level url like,
http://www.example.com/xyz-pqr/abcd-efgh/xyz-pqr/abcd-efgh/123.html
http://www.example.com/xyz-pqr/abcd-efgh/xyz-pqr/abcd-efgh/abcd-efgh/123.html
I want to write code in robots.txt so that, search engine will never crawl more than 3 directory level. How do I do this? Thanks in advance...
回答1:
I'm not certain, but I think the following should work:
User-agent: *
Disallow: /*/*/*/
So, given these two URLs:
http://www.example.com/xyz-pqr/abcd-efgh/123.html
http://www.example.com/xyz-pqr/abcd-efgh/foo-bar/123.html
The first would be accepted because it has only two directory segments (/xyz-pqr-abcd-efgh
).
The second would be blocked because it has three directory segments.
And anything longer would be blocked, as well.
来源:https://stackoverflow.com/questions/22341736/block-search-engine-crawling-directory