Situation:
Short answer: it can't be done reliably.
You can go a long way by simply blocking IP addresses that cause a certain number of hits in some time frame (some webservers support this out of the box, others require some modules, or you can do it by parsing your logfile and e.g. using iptables), but you need to take care not to block the major search engine crawlers and large ISP's proxies.