问题
I run a small, niche personal ads site. People post ads and then other people reply to them, which sends an email to the original creator of the ad telling them that someone is interested and giving them contact information for that interested person.
Lately there's been some weird spam. People are receiving nonsense replies to their ads. Here is an example of one:
Name: xkauwvyr
Reply: vRYmbI <a href="http://rypmoxdkfblf.com/">rypmoxdkfblf</a>, url=http://pnjlwvhizwbq.com/]pnjlwvhizwbq[/url], [link=http://hmenwoujxrfv.com/]hmenwoujxrfv[/link], http://ogsekuhoyeud.com/
They vary in length and composition but they all look roughly like that. The first idea I had was to simply throw out any reply that contained the string "a href" But this has me interested in a more robust method of preventing nonsense, maybe looking at every word and if a certain percentage are not in a dictionary, throw that reply out. What should I do?
Also, is this spam just some ass playing a trick on my website, or is it something more malicious?
回答1:
one trick that a lot of developers use are hidden honey pot fields in your forms. generally spam bots will fill out all fields, or at least one's they think are required. so what you do is, make an input named phone or something, then hide it with CSS. if the field is filled out, then you know that a bot submitted the form, and not to process it.
回答2:
Check out reCaptcha - http://code.google.com/apis/recaptcha/, it's really easy to implement. It's not likely that someone is coming in and manually entering these things. It's probably a bot.
Not sure if its malicious, not going to attempt to find out. It's someone trying to make money, through ad views, or worse exploiting browser flaws and installing malware, or any other number of things. Either way you want it gone and a CAPTCHA is a great way to do it.
Another thing you can do is block the IP address of whoever is posting the stuff, that can help cut it down as well. Of course it's trivial for them to use a proxy, or whatever, but you can never stop this stuff completely. It's basically a war and winning little battles can go a long way.
Edit
In regards to your idea of analyzing the text, that is a massive task that has been worked on since spam started. You can do research on how email spam is filtered with bayesian analysis and heuristic approaches. You won't want to spend that much time on it though, trust me.
If you want to just use something off the shelf, check out akismet - http://akismet.com/, it kinda rolls up that functionality in an API. It started as a wordpress plugin, and has evolved to a stand alone project that you can throw a comment at, and it will reply with the likelihood of it being spam.
回答3:
reCAPTCHA and strong moderation, nothing else. Should reduce spam to literally none.
回答4:
A few answers advised reCAPTCHA but "In fact, it reCAPTCHA became pretty useless".
It just discredits the original concept of CAPTCHA.
I would advise more flexible approaches in captcha-ring visitors
来源:https://stackoverflow.com/questions/4718747/how-can-i-cut-down-on-this-spam-and-what-is-the-point-of-it-anyway