问题
I am trying to add a deny rule to a robots.txt file, to deny access to a single page.
The website URLs work as follows:
- http://example.com/#!/homepage
- http://example.com/#!/about-us
- http://example.com/#!/super-secret
Javascript then swaps out the DIV that is displayed, based on the URL.
How would I request a search engine spider not list the following:
- http://example.com/#!/super-secret
- http://example.com/index.php#!/super-secret
Thanks in advance
回答1:
You can actually do this multiple ways, but here are the two simplest.
You have to exclude the URLs that Googlebot is going to fetch, which isn't the AJAX hashbang values, but the instead the translated ?_escaped_fragment_=key=value
In your robots.txt file specify:
Disallow: /?_escaped_fragment_=/super-secret
Disallow: /index.php?_escaped_fragment_=/super-secret
When in doubt, you should always use the Google Webmaster Tool » "Fetch As Googlebot".
If the page has already been indexed by Googlebot, using a robots.txt file won't remove it from the index. You'll either have to use the Google Webmaster Tools URL removal tool after you apply the robots.txt, or instead you can add a noindex command to the page via a <meta>
tag or X-Robots-Tag
in the HTTP Headers.
It would look something like:
<meta name="ROBOTS" content="NOINDEX, NOFOLLOW" />
or
X-Robots-Tag: noindex
回答2:
You can't (per se). Search engines wouldn't run JavaScript anyway, so will generally ignore the fragment identifier. You can only deny the URLs that would be requested from the server (which are without fragment identifiers).
Google will map hashbangs onto different URIs and you can figure out what those are (and you should have done already because that is the point of using hash bangs) and put them in robots.txt.
Hash bangs, however, are problematic at best, so I'd scrap them in favour of using the history API which allows you to use sane URIs.
来源:https://stackoverflow.com/questions/16987717/robots-txt-deny-for-a-url