Can I block search crawlers for every site on an Apache web server?

前端 未结 6 1128
走了就别回头了
走了就别回头了 2021-01-31 05:13

I have somewhat of a staging server on the public internet running copies of the production code for a few websites. I\'d really not like it if the staging sites get indexed. <

6条回答
  •  醉话见心
    2021-01-31 05:58

    Create a robots.txt file with the following contents:

    User-agent: *
    Disallow: /
    

    Put that file somewhere on your staging server; your directory root is a great place for it (e.g. /var/www/html/robots.txt).

    Add the following to your httpd.conf file:

    # Exclude all robots
    
        SetHandler None
    
    Alias /robots.txt /path/to/robots.txt
    

    The SetHandler directive is probably not required, but it might be needed if you're using a handler like mod_python, for example.

    That robots.txt file will now be served for all virtual hosts on your server, overriding any robots.txt file you might have for individual hosts.

    (Note: My answer is essentially the same thing that ceejayoz's answer is suggesting you do, but I had to spend a few extra minutes figuring out all the specifics to get it to work. I decided to put this answer here for the sake of others who might stumble upon this question.)

提交回复
热议问题