How can I configure my site to allow crawling from well known robots like google, bing, yahoo, alexa etc. and stop other harmful spammers, robots
should i block particul
the simplest way of doing this is to use a robots.txt file in the root directory of the website.
The syntax of the robots.txt file is as follows:
User-agent: *
Disallow: /
which effectively disallows all robots which respect the robots.txt convention from the defined pages.
The thing to remember though is not all web-crawlers respect this convention.
It can be very useful from preventing bots from hitting the server an insane number of times and it can also be useful for preventing some bots which you would prefer didn't touch the site at all, but it is unfortunately not a cure-all. As has been mentioned already, there is no such animal, spam is a constant headache.
For more info, have a look at http://www.robotstxt.org/