I want to prevent automated html scraping from one of our sites while not affecting legitimate spidering (googlebot, etc.). Is there something that already exists to accomplish
You should do what good firewalls do when they detect malicious use - let them keep going but don't give them anything else. If you start throwing 403 or 404 they'll know something is wrong. If you return random data they'll go about their business.
For detecting malicious use though, try adding a trap link on search results page (or the page they are using as your site map) and hide it with CSS. Need to check if they are claiming to be a valid bot and let them through though. You can store their IP for future use and a quick ARIN WHOIS search.