问题
I'm not even sure if this is the best way to handle this, but I had made a temporary mistake with my rewrites and Google (possibly others) picked up on it, now it has them indexed and keeps coming up with errors.
Basically, I'm generating URLs based on a variety of factors, one being the id of an article, which is automatically generated. These then redirect to the correct spot.
I had first accidentally set up stuff like this:
/2343/news/blahblahblah
/7645/reviews/blahblahblah
Etc.
This was a problem for a lot of reasons, the main one being that there would be duplicates and stuff wasn't pointing to the right places and yada yada. And I fixed them to this now:
/news/2343/blahblahblah
/reviews/7645/blahblahblah
Etc.
And that's all good. But I want to block anything that falls into the pattern of the first. In other words, anything that looks like this:
** = any numerical pattern
/**/anythingelsehere
So that Google (and any others who have maybe indexed the wrong stuff) stops trying to look for these URLs that were all messed up and that don't even exist anymore. Is this possible? Should I even be doing this through robots.txt?
回答1:
You don't need to setup a robots.txt for that, just return 404 errors for those urls and Google and other search engines will eventually drop them.
Google also has Webmaster tools which you can use to deindex urls. I'm pretty sure other hosts have similar things.
回答2:
To answer the question: Yes, you can block any URLs that start with a number.
User-agent: *
Disallow: /0
Disallow: /1
Disallow: /2
Disallow: /3
Disallow: /4
Disallow: /5
Disallow: /6
Disallow: /7
Disallow: /8
Disallow: /9
It would block URLs like:
example.com/1
example.com/2.html
example.com/3/foo
example.com/4you
example.com/52347612
These URLs would still be allowed:
example.com/foo/1
example.com/foo2.html
example.com/bar/3/foo
example.com/only4you
来源:https://stackoverflow.com/questions/13355409/can-i-use-robots-txt-to-block-any-directory-tree-that-starts-with-numbers