Ignore urls in robot.txt with specific parameters?

我的未来我决定 提交于 2019-12-17 10:31:56

问题


I would like for google to ignore urls like this:

http://www.mydomain.com/new-printers?dir=asc&order=price&p=3

All urls that have the parameters dir, order and price should be ignored but I dont have experience with Robots.txt.

Any idea?


回答1:


Here's a solutions if you want to disallow query strings:

Disallow: /*?*

or if you want to be more precise on your query string:

Disallow: /*?dir=*&order=*&p=*

You can also add to the robots.txt which url to allow

Allow: /new-printer$

The $ will make sure only the /new-printer will be allowed.

More info:

http://code.google.com/web/controlcrawlindex/docs/robots_txt.html

http://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/




回答2:


You can block those specific query string parameters with the following lines

Disallow: /*?*dir=
Disallow: /*?*order=
Disallow: /*?*p=

So if any URL contains dir=, order=, or p= anywhere in the query string, it will be blocked.




回答3:


Register your website with Google WebMaster Tools. There you can tell Google how to deal with your parameters.

Site Configuration -> URL Parameters

You should have the pages that contain those parameters indicate that they should be excluded from indexing via the robots meta tag. e.g.



来源:https://stackoverflow.com/questions/9149782/ignore-urls-in-robot-txt-with-specific-parameters

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!