How to disallow search pages from robots.txt

扶醉桌前 提交于 2019-12-21 20:26:58

问题


I need to disallow http://example.com/startup?page=2 search pages from being indexed.

I want http://example.com/startup to be indexed but not http://example.com/startup?page=2 and page3 and so on.

Also, startup can be random, e.g., http://example.com/XXXXX?page


回答1:


Something like this works, as confirmed by Google Webmaster Tools "test robots.txt" function:

User-Agent: *
Disallow: /startup?page=

Disallow The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved.

However, if the first part of the URL will change, you must use wildcards:

User-Agent: *
Disallow: /startup?page=
Disallow: *page=
Disallow: *?page=



回答2:


You can put this on the pages you do not want indexed:

<META NAME="ROBOTS" CONTENT="NONE">

This tells robots not to index the page.

On a search page, it may be more interesting to use:

<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">

This instructs robots to not index the current page, but still follow the links on this page, allowing them to get to the pages found in the search.




回答3:


  1. Create a text file and name it: robots.txt
  2. Add user agents and disallow sections (see sample below)
  3. Place the file in the root of your site

Sample:

###############################
#My robots.txt file
#
User-agent: *
#
#list directories robots are not allowed to index 
#
Disallow: /testing/
Disallow: /staging/
Disallow: /admin/
Disallow: /assets/
Disallow: /images/
#
#
#list specific files robots are not allowed to index
#
Disallow: /startup?page=2
Disallow: /startup?page=3
Disallow: /startup?page=3
# 
#
#End of robots.txt file
#
###############################

Here's a link to Google's actual robots.txt file

You can get some good information on the Google webmaster's help topic on blocking or removing pages using a robots.txt file



来源:https://stackoverflow.com/questions/1517541/how-to-disallow-search-pages-from-robots-txt

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!