How do I tell search engines not to index content via secondary domain names?

跟風遠走 提交于 2019-12-08 06:45:56

问题


I have a website at a.com (for example). I also have a couple of other domain names which I am not using for anything: b.com and c.com. They currently forward to a.com. I have noticed that Google is indexing content from my site using b.com/stuff and c.com/stuff, not just a.com/stuff. What is the proper way to tell Google to only index content via a.com, not b.com and c.com?

It seems as if a 301 redirect via htaccess is the best solution, but I am not sure how to do that. There is only the one htaccess file (each domain does not have its own htaccess file).

b.com and c.com are not meant to be aliases of a.com, they are just other domain names I am reserving for possible future projects.


回答1:


You can simply create a redirect with a .htaccess file like this:

RewriteEngine on
RewriteCond %{HTTP_HOST} \.b\.com$ [OR]
RewriteCond %{HTTP_HOST} \.c\.com$
RewriteRule ^(.*)$ http://a.com/$1 [R=301,L]



回答2:


robots.txt is the way to tell spiders what to crawl and what to not crawl. If you put the following in the root of your site at /robots.txt:

User-agent: *
Disallow: /

A well-behaved spider will not search any part of your site. Most large sites have a robots.txt, like google

User-agent: *
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /news
#and so on ...



回答3:


It pretty much depends of what you want to achieve. 301 will say that the content is moved permanently (and it is the proper way of transferring PR), is this what you want to achieve?

You want Google to behave? Than you may use robots.txt, but keep in mind there is a downside: this file is readable from outside and every time located in the same place, so you basically give away the location of directories and files that you may want to protect. So use robots.txt only if there is nothing worth protecting.

If there is something worth protecting than you should password protect the directory, this would be the proper way. Google will not index password protected directories.

http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93708

For the last method it depends if you want to use the httpd.conf file or .htaccess. The best way will be to use httpd.conf, even if .htaccess seems easier.

http://httpd.apache.org/docs/2.0/howto/auth.html




回答4:


Have your server side code generate a canonical reference that point to the page to be considered "source". Example =

Reference: http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html - Update: this link-tag is currently also supported by Ask.com, Microsoft Live Search and Yahoo!.



来源:https://stackoverflow.com/questions/3498491/how-do-i-tell-search-engines-not-to-index-content-via-secondary-domain-names

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!