问题
I was unable to find information about my case. I want to restrict the following types of URLs to be indexed:
website.com/video-title/video-title/
(my website produces such double URL copies of my video-articles)
Each video article starts with the word "video" in the beginning of its URL.
So what I want to do is to restrict all URLs that have website.com/"any-url"/video-any-url"
This way I will remove all the doubled copies. Could somebody help me?
回答1:
This is not possible in the original robots.txt specification.
But some parsers may support wildcards in Disallow
anyway, for example, Google:
Googlebot (but not all search engines) respects some pattern matching.
So for Google’s bots, you could use the following line:
Disallow: /*/video
This should block any URLs whose paths starts with anything, and contains "video", for example:
/foo/video
/foo/videos
/foo/video.html
/foo/video/bar
/foo/bar/videos
/foo/bar/foo/bar/videos
Other parsers not supporting this would interpret it literally, i.e., they would block the following URLs:
/*/video
/*/videos
/*/video/foo
来源:https://stackoverflow.com/questions/21734781/robots-txt-restriction-of-category-urls