How can I configure my site to allow crawling from well known robots like google, bing, yahoo, alexa etc. and stop other harmful spammers, robots
should i block particul
Blocking by IP can be useful, but the method that I use is blocking by user-agent, that way you can trap many different IPs using apps that you don't want, especially site grabbers. I won't provide our list as you need to concentrate on those that affect you. For our use we have identified more than 130 applications that are not web browsers and not search engines that we don't want accessing our web. But you can start with a web search on user-agents for site grabbing.