I have a website at a.com (for example). I also have a couple of other domain names which I am not using for anything: b.com and c.com. They currently forward to a.com. I have noticed that Google is indexing content from my site using b.com/stuff and c.com/stuff, not just a.com/stuff. What is the proper way to tell Google to only index content via a.com, not b.com and c.com?
It seems as if a 301 redirect via htaccess is the best solution, but I am not sure how to do that. There is only the one htaccess file (each domain does not have its own htaccess file).
b.com and c.com are not meant to be aliases of a.com, they are just other domain names I am reserving for possible future projects.
You can simply create a redirect with a .htaccess
file like this:
RewriteEngine on
RewriteCond %{HTTP_HOST} \.b\.com$ [OR]
RewriteCond %{HTTP_HOST} \.c\.com$
RewriteRule ^(.*)$ http://a.com/$1 [R=301,L]
robots.txt is the way to tell spiders what to crawl and what to not crawl. If you put the following in the root of your site at /robots.txt:
User-agent: *
Disallow: /
A well-behaved spider will not search any part of your site. Most large sites have a robots.txt, like google
User-agent: *
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /news
#and so on ...
It pretty much depends of what you want to achieve. 301 will say that the content is moved permanently (and it is the proper way of transferring PR), is this what you want to achieve?
You want Google to behave? Than you may use robots.txt, but keep in mind there is a downside: this file is readable from outside and every time located in the same place, so you basically give away the location of directories and files that you may want to protect. So use robots.txt only if there is nothing worth protecting.
If there is something worth protecting than you should password protect the directory, this would be the proper way. Google will not index password protected directories.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93708
For the last method it depends if you want to use the httpd.conf file or .htaccess. The best way will be to use httpd.conf, even if .htaccess seems easier.
Have your server side code generate a canonical reference that point to the page to be considered "source". Example =
Reference: http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html - Update: this link-tag is currently also supported by Ask.com, Microsoft Live Search and Yahoo!.
来源:https://stackoverflow.com/questions/3498491/how-do-i-tell-search-engines-not-to-index-content-via-secondary-domain-names