How to protect/monitor your site from crawling by malicious user

后端 未结 9 754

Situation:

  • Site with content protected by username/password (not all controlled since they can be trial/test users)
  • a normal search engine can\'t get at i
相关标签:
9条回答
  • 2021-02-06 19:29

    I would not recommend automatic lock-outs, not so much because they are necessarily evil, but because they provide immediate feedback to the malicious user that they tripped a sensor, and let them know not to do the same thing with the next account they sign up with.

    And user-agent blocking is probably not going to be very helpful, because obviously user-agents are very easy to fake.

    About the best you can probably do is monitoring, but then you still have to ask what you're going to do if you detect malicious behavior. As long as you have uncontrolled access, anyone you lock out can just sign up again under a different identity. I don't know what kind of info you require to get an account, but just a name and e-mail address, for instance, isn't going to be much of a hurdle for anybody.

    It's the classic DRM problem -- if anyone can see the information, then anyone can do anything else they want with it. You can make it difficult, but ultimately if someone is really determined, you can't stop them, and you risk interfering with legitimate users and hurting your business.

    0 讨论(0)
  • 2021-02-06 19:34

    Short answer: it can't be done reliably.

    You can go a long way by simply blocking IP addresses that cause a certain number of hits in some time frame (some webservers support this out of the box, others require some modules, or you can do it by parsing your logfile and e.g. using iptables), but you need to take care not to block the major search engine crawlers and large ISP's proxies.

    0 讨论(0)
  • 2021-02-06 19:39

    Apache has some bandwidth-by-IP limiting modules AFAIK, and for my own largeish Java/JSP application with a lot of digital content I rolled my own servlet filter to do the same (and limit simultaneous connections from one IP block, etc).

    I agree with comments above that it's better to be subtle so that a malicious user cannot tell if/when they've tripped your alarms and thusy don't know to take evasive action. In my case my server just seems to become slow and flaky and unreliable (so no change there then)...

    Rgds

    Damon

    0 讨论(0)
提交回复
热议问题