I\'m looking into building a content site with possibly thousands of different entries, accessible by index and by search.
What are the measures I can take to preven
I used to have a system that would block or allow based on the User-Agent header. It relies on the crawler setting their User-Agent but it seems most of them do.
It won't work if they use a fake header to emulate a popular browser of course.