I have somewhat of a staging server on the public internet running copies of the production code for a few websites. I\'d really not like it if the staging sites get indexed. <
You can use Apache's mod_rewrite to do it. Let's assume that your real host is www.example.com and your staging host is staging.example.com. Create a file called 'robots-staging.txt' and conditionally rewrite the request to go to that.
This example would be suitable for protecting a single staging site, a bit of a simpler use case than what you are asking for, but this has worked reliably for me:
RewriteEngine on
# Dissuade web spiders from crawling the staging site
RewriteCond %{HTTP_HOST} ^staging\.example\.com$
RewriteRule ^robots.txt$ robots-staging.txt [L]
You could try to redirect the spiders to a master robots.txt on a different server, but some of the spiders may balk after they get anything other than a "200 OK" or "404 not found" return code from the HTTP request, and they may not read the redirected URL.
Here's how you would do that:
RewriteEngine on
# Redirect web spiders to a robots.txt file elsewhere (possibly unreliable)
RewriteRule ^robots.txt$ http://www.example.com/robots-staging.txt [R]