Robots.txt deny, for a #! URL

☆樱花仙子☆ 提交于 2019-12-18 09:19:24

问题


I am trying to add a deny rule to a robots.txt file, to deny access to a single page.

The website URLs work as follows:

  • http://example.com/#!/homepage
  • http://example.com/#!/about-us
  • http://example.com/#!/super-secret

Javascript then swaps out the DIV that is displayed, based on the URL.

How would I request a search engine spider not list the following:

  • http://example.com/#!/super-secret
  • http://example.com/index.php#!/super-secret

Thanks in advance


回答1:


You can actually do this multiple ways, but here are the two simplest.

You have to exclude the URLs that Googlebot is going to fetch, which isn't the AJAX hashbang values, but the instead the translated ?_escaped_fragment_=key=value

In your robots.txt file specify:

Disallow: /?_escaped_fragment_=/super-secret
Disallow: /index.php?_escaped_fragment_=/super-secret

When in doubt, you should always use the Google Webmaster Tool » "Fetch As Googlebot".

If the page has already been indexed by Googlebot, using a robots.txt file won't remove it from the index. You'll either have to use the Google Webmaster Tools URL removal tool after you apply the robots.txt, or instead you can add a noindex command to the page via a <meta> tag or X-Robots-Tag in the HTTP Headers.

It would look something like:

<meta name="ROBOTS" content="NOINDEX, NOFOLLOW" />

or

X-Robots-Tag: noindex



回答2:


You can't (per se). Search engines wouldn't run JavaScript anyway, so will generally ignore the fragment identifier. You can only deny the URLs that would be requested from the server (which are without fragment identifiers).

Google will map hashbangs onto different URIs and you can figure out what those are (and you should have done already because that is the point of using hash bangs) and put them in robots.txt.

Hash bangs, however, are problematic at best, so I'd scrap them in favour of using the history API which allows you to use sane URIs.



来源:https://stackoverflow.com/questions/16987717/robots-txt-deny-for-a-url

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!