I have somewhat of a staging server on the public internet running copies of the production code for a few websites. I\'d really not like it if the staging sites get indexed. <
Create a robots.txt file with the following contents:
User-agent: *
Disallow: /
Put that file somewhere on your staging server; your directory root is a great place for it (e.g. /var/www/html/robots.txt
).
Add the following to your httpd.conf file:
# Exclude all robots
SetHandler None
Alias /robots.txt /path/to/robots.txt
The SetHandler
directive is probably not required, but it might be needed if you're using a handler like mod_python, for example.
That robots.txt file will now be served for all virtual hosts on your server, overriding any robots.txt file you might have for individual hosts.
(Note: My answer is essentially the same thing that ceejayoz's answer is suggesting you do, but I had to spend a few extra minutes figuring out all the specifics to get it to work. I decided to put this answer here for the sake of others who might stumble upon this question.)