How do I tell search engines not to index content via secondary domain names?

问题

I have a website at a.com (for example). I also have a couple of other domain names which I am not using for anything: b.com and c.com. They currently forward to a.com. I have noticed that Google is indexing content from my site using b.com/stuff and c.com/stuff, not just a.com/stuff. What is the proper way to tell Google to only index content via a.com, not b.com and c.com?

It seems as if a 301 redirect via htaccess is the best solution, but I am not sure how to do that. There is only the one htaccess file (each domain does not have its own htaccess file).

b.com and c.com are not meant to be aliases of a.com, they are just other domain names I am reserving for possible future projects.

回答1:

You can simply create a redirect with a .htaccess file like this:

RewriteEngine on
RewriteCond %{HTTP_HOST} \.b\.com$ [OR]
RewriteCond %{HTTP_HOST} \.c\.com$
RewriteRule ^(.*)$ http://a.com/$1 [R=301,L]

回答2:

robots.txt is the way to tell spiders what to crawl and what to not crawl. If you put the following in the root of your site at /robots.txt:

User-agent: *
Disallow: /

A well-behaved spider will not search any part of your site. Most large sites have a robots.txt, like google

User-agent: *
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /news
#and so on ...

回答3:

It pretty much depends of what you want to achieve. 301 will say that the content is moved permanently (and it is the proper way of transferring PR), is this what you want to achieve?

You want Google to behave? Than you may use robots.txt, but keep in mind there is a downside: this file is readable from outside and every time located in the same place, so you basically give away the location of directories and files that you may want to protect. So use robots.txt only if there is nothing worth protecting.

If there is something worth protecting than you should password protect the directory, this would be the proper way. Google will not index password protected directories.

http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93708

For the last method it depends if you want to use the httpd.conf file or .htaccess. The best way will be to use httpd.conf, even if .htaccess seems easier.

http://httpd.apache.org/docs/2.0/howto/auth.html

回答4:

Have your server side code generate a canonical reference that point to the page to be considered "source". Example =

Reference: http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html - Update: this link-tag is currently also supported by Ask.com, Microsoft Live Search and Yahoo!.

来源：https://stackoverflow.com/questions/3498491/how-do-i-tell-search-engines-not-to-index-content-via-secondary-domain-names

标签

.htaccess

redirect

dns

search-engine

google-index